Merge branch 'master' into GH-2/minimum-requirement-content
commit
e91001782c
assets/sass
content
de
community/static
docs
home
reference
glossary
kubectl
setup
tasks/tools
tutorials
en
blog/_posts
community
docs/concepts
architecture
extend-kubernetes
api-extension
compute-storage-net
overview
scheduling
security
workloads/controllers
|
@ -5,7 +5,7 @@ charset = utf-8
|
|||
max_line_length = 80
|
||||
trim_trailing_whitespace = true
|
||||
|
||||
[*.{html,js,json,sass,md,mmark,toml,yaml}]
|
||||
[*.{css,html,js,json,sass,md,mmark,toml,yaml}]
|
||||
indent_style = space
|
||||
indent_size = 2
|
||||
|
||||
|
|
9
Makefile
9
Makefile
|
@ -18,7 +18,7 @@ build: ## Build site with production settings and put deliverables in ./public
|
|||
build-preview: ## Build site with drafts and future posts enabled
|
||||
hugo --buildDrafts --buildFuture
|
||||
|
||||
deploy-preview: check-hugo-versions ## Deploy preview site via netlify
|
||||
deploy-preview: ## Deploy preview site via netlify
|
||||
hugo --enableGitInfo --buildFuture -b $(DEPLOY_PRIME_URL)
|
||||
|
||||
functions-build:
|
||||
|
@ -27,9 +27,9 @@ functions-build:
|
|||
check-headers-file:
|
||||
scripts/check-headers-file.sh
|
||||
|
||||
production-build: check-hugo-versions build check-headers-file ## Build the production site and ensure that noindex headers aren't added
|
||||
production-build: build check-headers-file ## Build the production site and ensure that noindex headers aren't added
|
||||
|
||||
non-production-build: check-hugo-versions ## Build the non-production site, which adds noindex headers to prevent indexing
|
||||
non-production-build: ## Build the non-production site, which adds noindex headers to prevent indexing
|
||||
hugo --enableGitInfo
|
||||
|
||||
serve: ## Boot the development server.
|
||||
|
@ -47,6 +47,3 @@ docker-serve:
|
|||
test-examples:
|
||||
scripts/test_examples.sh install
|
||||
scripts/test_examples.sh run
|
||||
|
||||
check-hugo-versions:
|
||||
scripts/hugo-version-check.sh $(HUGO_VERSION)
|
||||
|
|
|
@ -30,7 +30,6 @@ aliases:
|
|||
- onlydole
|
||||
- parispittman
|
||||
- vonguard
|
||||
- onlydole
|
||||
sig-docs-de-owners: # Admins for German content
|
||||
- bene2k1
|
||||
- mkorbi
|
||||
|
@ -40,34 +39,33 @@ aliases:
|
|||
- mkorbi
|
||||
- rlenferink
|
||||
sig-docs-en-owners: # Admins for English content
|
||||
- bradamant3
|
||||
- bradtopol
|
||||
- daminisatya
|
||||
- gochist
|
||||
- jaredbhatti
|
||||
- jimangel
|
||||
- kbarnard10
|
||||
- kbhawkey
|
||||
- makoscafee
|
||||
- onlydole
|
||||
- Rajakavitha1
|
||||
- ryanmcginnis
|
||||
- sftim
|
||||
- steveperry-53
|
||||
- tengqm
|
||||
- vineethreddy02
|
||||
- xiangpengzhao
|
||||
- zacharysarah
|
||||
- zparnold
|
||||
sig-docs-en-reviews: # PR reviews for English content
|
||||
- bradamant3
|
||||
- bradtopol
|
||||
- daminisatya
|
||||
- gochist
|
||||
- jaredbhatti
|
||||
- jimangel
|
||||
- kbarnard10
|
||||
- kbhawkey
|
||||
- makoscafee
|
||||
- onlydole
|
||||
- rajakavitha1
|
||||
- rajeshdeshpande02
|
||||
- sftim
|
||||
- steveperry-53
|
||||
- tengqm
|
||||
|
@ -130,12 +128,10 @@ aliases:
|
|||
- fabriziopandini
|
||||
- mattiaperi
|
||||
- micheleberardi
|
||||
- rlenferink
|
||||
sig-docs-it-reviews: # PR reviews for Italian content
|
||||
- fabriziopandini
|
||||
- mattiaperi
|
||||
- micheleberardi
|
||||
- rlenferink
|
||||
sig-docs-ja-owners: # Admins for Japanese content
|
||||
- cstoku
|
||||
- inductor
|
||||
|
@ -160,7 +156,6 @@ aliases:
|
|||
- seokho-son
|
||||
- ysyukr
|
||||
sig-docs-maintainers: # Website maintainers
|
||||
- bradamant3
|
||||
- jimangel
|
||||
- kbarnard10
|
||||
- pwittrock
|
||||
|
@ -195,10 +190,12 @@ aliases:
|
|||
- femrtnz
|
||||
- jcjesus
|
||||
- devlware
|
||||
- jhonmike
|
||||
sig-docs-pt-reviews: # PR reviews for Portugese content
|
||||
- femrtnz
|
||||
- jcjesus
|
||||
- devlware
|
||||
- jhonmike
|
||||
sig-docs-vi-owners: # Admins for Vietnamese content
|
||||
- huynguyennovem
|
||||
- ngtuna
|
||||
|
|
|
@ -28,7 +28,7 @@ El método recomendado para levantar una copia local del sitio web kubernetes.io
|
|||
|
||||
> Para Windows, algunas otras herramientas como Make son necesarias. Puede instalarlas utilizando el gestor [Chocolatey](https://chocolatey.org). `choco install make` o siguiendo las instrucciones de [Make for Windows](http://gnuwin32.sourceforge.net/packages/make.htm).
|
||||
|
||||
> Si prefiere levantar el sitio web sin utilizar **Docker**, puede seguir las instrucciones disponibles en la sección [Levantando kubernetes.io en local con Hugo](#levantando-kubernetes.io-en-local-con-hugo).
|
||||
> Si prefiere levantar el sitio web sin utilizar **Docker**, puede seguir las instrucciones disponibles en la sección [Levantando kubernetes.io en local con Hugo](#levantando-kubernetesio-en-local-con-hugo).
|
||||
|
||||
Una vez tenga Docker [configurado en su máquina](https://www.docker.com/get-started), puede construir la imagen de Docker `kubernetes-hugo` localmente ejecutando el siguiente comando en la raíz del repositorio:
|
||||
|
||||
|
|
|
@ -33,7 +33,7 @@ La façon recommandée d'exécuter le site web Kubernetes localement est d'utili
|
|||
|
||||
> Si vous êtes sous Windows, vous aurez besoin de quelques outils supplémentaires que vous pouvez installer avec [Chocolatey](https://chocolatey.org). `choco install install make`
|
||||
|
||||
> Si vous préférez exécuter le site Web localement sans Docker, voir [Exécuter le site localement avec Hugo](#running-the-site-locally-using-hugo) ci-dessous.
|
||||
> Si vous préférez exécuter le site Web localement sans Docker, voir [Exécuter le site localement avec Hugo](#exécuter-le-site-localement-en-utilisant-hugo) ci-dessous.
|
||||
|
||||
Si vous avez Docker [up and running](https://www.docker.com/get-started), construisez l'image Docker `kubernetes-hugo' localement:
|
||||
|
||||
|
|
|
@ -9,7 +9,7 @@ Selamat datang! Repositori ini merupakan wadah bagi semua komponen yang dibutuhk
|
|||
|
||||
Pertama, kamu dapat menekan tombol **Fork** yang berada pada bagian atas layar, untuk menyalin repositori pada akun Github-mu. Salinan ini disebut sebagai **fork**. Kamu dapat menambahkan konten pada **fork** yang kamu miliki, setelah kamu merasa cukup untuk menambahkan konten yang kamu miliki dan ingin memberikan konten tersebut pada kami, kamu dapat melihat **fork** yang telah kamu buat dan membuat **pull request** untuk memberi tahu kami bahwa kamu ingin menambahkan konten yang telah kamu buat.
|
||||
|
||||
Setelah kamu membuat sebuah **pull request**, seorang **reviewer** akan memberikan masukan terhadap konten yang kamu sediakan serta beberapa hal yang dapat kamu lakukan apabila perbaikan diperlukan terhadap konten yang telah kamu sediakan. Sebagai seorang yang membuat **pull request**, **sudah menjadi kewajiban kamu untuk melakukan modifikasi terhadap konten yang kamu berikan sesuai dengan masukan yang diberikan oleh seorang reviewer Kubernetes**. Perlu kamu ketahui bahwa kamu dapat saja memiliki lebih dari satu orang **reviewer Kubernetes** atau dalam kasus kamu bisa saja mendapatkan **reviewer Kubernetes** yang berbeda dengan **reviewer Kubernetes** awal yang ditugaskan untuk memberikan masukan terhadap konten yang kamu sediakan. Selain itu, seorang **reviewer Kubernetes** bisa saja meminta masukan teknis dari [reviewer teknis Kubernetes](https://github.com/kubernetes/website/wiki/Tech-reviewers) jika diperlukan.
|
||||
Setelah kamu membuat sebuah **pull request**, seorang **reviewer** akan memberikan masukan terhadap konten yang kamu sediakan serta beberapa hal yang dapat kamu lakukan apabila perbaikan diperlukan terhadap konten yang telah kamu sediakan. Sebagai seorang yang membuat **pull request**, **sudah menjadi kewajiban kamu untuk melakukan modifikasi terhadap konten yang kamu berikan sesuai dengan masukan yang diberikan oleh seorang reviewer Kubernetes**. Perlu kamu ketahui bahwa kamu dapat saja memiliki lebih dari satu orang **reviewer Kubernetes** atau dalam kasus kamu bisa saja mendapatkan **reviewer Kubernetes** yang berbeda dengan **reviewer Kubernetes** awal yang ditugaskan untuk memberikan masukan terhadap konten yang kamu sediakan. Selain itu, seorang **reviewer Kubernetes** bisa saja meminta masukan teknis dari [reviewer teknis Kubernetes](https://github.com/kubernetes/website/wiki/Tech-reviewers) jika diperlukan.
|
||||
|
||||
Untuk informasi lebih lanjut mengenai tata cara melakukan kontribusi, kamu dapat melihat tautan di bawah ini:
|
||||
|
||||
|
@ -21,11 +21,11 @@ Untuk informasi lebih lanjut mengenai tata cara melakukan kontribusi, kamu dapat
|
|||
|
||||
## Menjalankan Dokumentasi Kubernetes pada Mesin Lokal Kamu
|
||||
|
||||
Petunjuk yang disarankan untuk menjalankan Dokumentasi Kubernetes pada mesin lokal kamus adalah dengan menggunakan [Docker](https://docker.com) **image** yang memiliki **package** [Hugo](https://gohugo.io), **Hugo** sendiri merupakan generator website statis.
|
||||
Petunjuk yang disarankan untuk menjalankan Dokumentasi Kubernetes pada mesin lokal kamus adalah dengan menggunakan [Docker](https://docker.com) **image** yang memiliki **package** [Hugo](https://gohugo.io), **Hugo** sendiri merupakan generator website statis.
|
||||
|
||||
> Jika kamu menggunakan Windows, kamu mungkin membutuhkan beberapa langkah tambahan untuk melakukan instalasi perangkat lunak yang dibutuhkan. Instalasi ini dapat dilakukan dengan menggunakan [Chocolatey](https://chocolatey.org). `choco install make`
|
||||
|
||||
> Jika kamu ingin menjalankan **website** tanpa menggunakan **Docker**, kamu dapat melihat tautan berikut [Petunjuk untuk menjalankan website pada mesin lokal dengan menggunakan Hugo](#petunjuk-untuk-menjalankan-website-pada-mesin-lokal-denga-menggunakan-hugo) di bagian bawah.
|
||||
> Jika kamu ingin menjalankan **website** tanpa menggunakan **Docker**, kamu dapat melihat tautan berikut [Petunjuk untuk menjalankan website pada mesin lokal dengan menggunakan Hugo](#petunjuk-untuk-menjalankan-website-pada-mesin-lokal-dengan-menggunakan-hugo) di bagian bawah.
|
||||
|
||||
Jika kamu sudah memiliki **Docker** [yang sudah dapat digunakan](https://www.docker.com/get-started), kamu dapat melakukan **build** `kubernetes-hugo` **Docker image** secara lokal:
|
||||
|
||||
|
@ -44,7 +44,7 @@ Buka **browser** kamu ke http://localhost:1313 untuk melihat laman dokumentasi.
|
|||
|
||||
## Petunjuk untuk menjalankan website pada mesin lokal dengan menggunakan Hugo
|
||||
|
||||
Kamu dapat melihat [dokumentasi resmi Hugo](https://gohugo.io/getting-started/installing/) untuk mengetahui langkah yang diperlukan untuk melakukan instalasi **Hugo**. Pastikan kamu melakukan instalasi versi **Hugo** sesuai dengan versi yang tersedia pada **environment variable** `HUGO_VERSION` pada **file**[`netlify.toml`](netlify.toml#L9).
|
||||
Kamu dapat melihat [dokumentasi resmi Hugo](https://gohugo.io/getting-started/installing/) untuk mengetahui langkah yang diperlukan untuk melakukan instalasi **Hugo**. Pastikan kamu melakukan instalasi versi **Hugo** sesuai dengan versi yang tersedia pada **environment variable** `HUGO_VERSION` pada **file**[`netlify.toml`](netlify.toml#L9).
|
||||
|
||||
Untuk menjalankan laman pada mesin lokal setelah instalasi **Hugo**, kamu dapat menjalankan perintah berikut:
|
||||
|
||||
|
|
|
@ -21,11 +21,11 @@ Per maggiori informazioni su come contribuire alla documentazione Kubernetes, ve
|
|||
|
||||
## Eseguire il sito Web localmente usando Docker
|
||||
|
||||
Il modo consigliato per eseguire localmente il sito Web Kubernetes prevede l'utilizzo di un'immagine [Docker] (https://docker.com) inclusa nel sito e configurata con tutti i software necessari, a partire dal generatore di siti web statici [Hugo] (https://gohugo.io).
|
||||
Il modo consigliato per eseguire localmente il sito Web Kubernetes prevede l'utilizzo di un'immagine [Docker](https://docker.com) inclusa nel sito e configurata con tutti i software necessari, a partire dal generatore di siti web statici [Hugo](https://gohugo.io).
|
||||
|
||||
> Se stai utilizzando Windows, avrai bisogno di alcuni strumenti aggiuntivi che puoi installare con [Chocolatey] (https://chocolatey.org). `choco install make`
|
||||
> Se stai utilizzando Windows, avrai bisogno di alcuni strumenti aggiuntivi che puoi installare con [Chocolatey](https://chocolatey.org). `choco install make`
|
||||
|
||||
> Se preferisci eseguire il sito Web localmente senza Docker, vedi [Eseguire il sito Web localmente utilizzando Hugo](# running-the-site-local-using-hugo) di seguito.
|
||||
> Se preferisci eseguire il sito Web localmente senza Docker, vedi [Eseguire il sito Web localmente utilizzando Hugo](#eseguire-il-sito-web-localmente-utilizzando-hugo) di seguito.
|
||||
|
||||
Se hai Docker [attivo e funzionante](https://www.docker.com/get-started), crea l'immagine Docker `kubernetes-hugo` localmente:
|
||||
|
||||
|
|
|
@ -41,7 +41,7 @@ Zalecaną metodą uruchomienia serwisu internetowego Kubernetesa lokalnie jest u
|
|||
choco install make
|
||||
```
|
||||
|
||||
> Jeśli wolisz uruchomić serwis lokalnie bez Dockera, przeczytaj [jak uruchomić serwis lokalnie przy pomocy Hugo](#jak-uruchomić-serwis-lokalnie-przy-pomocy-hugo) poniżej.
|
||||
> Jeśli wolisz uruchomić serwis lokalnie bez Dockera, przeczytaj [jak uruchomić serwis lokalnie przy pomocy Hugo](#jak-uruchomić-lokalną-kopię-strony-przy-pomocy-hugo) poniżej.
|
||||
|
||||
Jeśli [zainstalowałeś i uruchomiłeś](https://www.docker.com/get-started) już Dockera, zbuduj obraz `kubernetes-hugo` lokalnie:
|
||||
|
||||
|
|
|
@ -34,7 +34,7 @@
|
|||
|
||||
> Если вы используете Windows, вам необходимо установить дополнительные инструменты через [Chocolatey](https://chocolatey.org). `choco install make`
|
||||
|
||||
> Если вы хотите запустить сайт локально без Docker, обратитесь к разделу [Запуск сайта с помощью Hugo](#running-the-site-locally-using-hugo) ниже на этой странице.
|
||||
> Если вы хотите запустить сайт локально без Docker, обратитесь к разделу [Запуск сайта с помощью Hugo](#запуск-сайта-с-помощью-hugo) ниже на этой странице.
|
||||
|
||||
Когда Docker [установлен и запущен](https://www.docker.com/get-started), соберите локально Docker-образ `kubernetes-hugo`, выполнив команду в консоли:
|
||||
|
||||
|
|
|
@ -26,7 +26,7 @@ Cách được đề xuất để chạy trang web Kubernetes cục bộ là dù
|
|||
|
||||
> Nếu bạn làm việc trên môi trường Windows, bạn sẽ cần thêm môt vài công cụ mà bạn có thể cài đặt với [Chocolatey](https://chocolatey.org). `choco install make`
|
||||
|
||||
> Nếu bạn không muốn dùng Docker để chạy trang web cục bộ, hãy xem [Chạy website cục bộ dùng Hugo](#Chạy website cục bộ dùng Hugo) dưới đây.
|
||||
> Nếu bạn không muốn dùng Docker để chạy trang web cục bộ, hãy xem [Chạy website cục bộ dùng Hugo](#chạy-website-cục-bộ-dùng-hugo) dưới đây.
|
||||
|
||||
Nếu bạn có Docker đang [up và running](https://www.docker.com/get-started), build `kubernetes-hugo` Docker image cục bộ:
|
||||
|
||||
|
|
|
@ -122,7 +122,7 @@ Open up your browser to http://localhost:1313 to view the website. As you make c
|
|||
<!--
|
||||
## Running the website locally using Hugo
|
||||
-->
|
||||
## 使用 Hugo 在本地运行网站
|
||||
## 使用 Hugo 在本地运行网站 {#running-the-site-locally-using-hugo}
|
||||
|
||||
<!--
|
||||
See the [official Hugo documentation](https://gohugo.io/getting-started/installing/) for Hugo installation instructions.
|
||||
|
|
|
@ -37,7 +37,7 @@ The recommended way to run the Kubernetes website locally is to run a specialize
|
|||
|
||||
> If you are running on Windows, you'll need a few more tools which you can install with [Chocolatey](https://chocolatey.org). `choco install make`
|
||||
|
||||
> If you'd prefer to run the website locally without Docker, see [Running the website locally using Hugo](#running-the-site-locally-using-hugo) below.
|
||||
> If you'd prefer to run the website locally without Docker, see [Running the website locally using Hugo](#running-the-website-locally-using-hugo) below.
|
||||
|
||||
If you have Docker [up and running](https://www.docker.com/get-started), build the `kubernetes-hugo` Docker image locally:
|
||||
|
||||
|
|
|
@ -10,6 +10,6 @@
|
|||
# DO NOT REPORT SECURITY VULNERABILITIES DIRECTLY TO THESE NAMES, FOLLOW THE
|
||||
# INSTRUCTIONS AT https://kubernetes.io/security/
|
||||
|
||||
bradamant3
|
||||
jimangel
|
||||
kbarnard10
|
||||
zacharysarah
|
||||
|
|
|
@ -109,6 +109,7 @@ header
|
|||
box-shadow: 0 0 0 transparent
|
||||
transition: 0.3s
|
||||
text-align: center
|
||||
overflow: hidden
|
||||
|
||||
|
||||
.logo
|
||||
|
@ -244,8 +245,7 @@ header
|
|||
background-color: white
|
||||
|
||||
#mainNav
|
||||
display: none
|
||||
|
||||
|
||||
h5
|
||||
color: $blue
|
||||
font-weight: normal
|
||||
|
@ -578,6 +578,9 @@ section
|
|||
li
|
||||
display: inline-block
|
||||
height: 100%
|
||||
margin-right: 10px
|
||||
&:last-child
|
||||
margin-right: 0
|
||||
|
||||
a
|
||||
display: block
|
||||
|
@ -598,11 +601,11 @@ section
|
|||
#vendorStrip
|
||||
line-height: 44px
|
||||
max-width: 100%
|
||||
overflow-x: auto
|
||||
-webkit-overflow-scrolling: touch
|
||||
|
||||
ul
|
||||
float: none
|
||||
overflow-x: auto
|
||||
|
||||
#searchBox
|
||||
float: none
|
||||
|
@ -1052,6 +1055,9 @@ dd
|
|||
a.issue
|
||||
margin-left: 0px
|
||||
|
||||
.gridPageHome .flyout-button
|
||||
display: none
|
||||
|
||||
.feedback--no
|
||||
margin-left: 1em
|
||||
|
||||
|
|
|
@ -107,7 +107,7 @@ $video-section-height: 550px
|
|||
padding-right: 10px
|
||||
|
||||
#home
|
||||
section, header, footer
|
||||
section, header
|
||||
.main-section
|
||||
max-width: 1000px
|
||||
|
||||
|
@ -178,16 +178,18 @@ $video-section-height: 550px
|
|||
nav
|
||||
overflow: hidden
|
||||
margin-bottom: 20px
|
||||
display: flex
|
||||
justify-content: space-between
|
||||
|
||||
a
|
||||
width: 16.65%
|
||||
width: auto
|
||||
float: left
|
||||
font-size: 24px
|
||||
font-weight: 300
|
||||
white-space: nowrap
|
||||
|
||||
.social
|
||||
padding: 0 30px
|
||||
padding: 0
|
||||
max-width: 1200px
|
||||
|
||||
div
|
||||
|
|
|
@ -133,18 +133,21 @@ $feature-box-div-width: 45%
|
|||
max-width: 25%
|
||||
max-height: 100%
|
||||
transform: translateY(-50%)
|
||||
width: 100%
|
||||
|
||||
&:nth-child(odd)
|
||||
padding-right: 210px
|
||||
|
||||
.image-wrapper
|
||||
right: 0
|
||||
text-align: right
|
||||
|
||||
&:nth-child(even)
|
||||
padding-left: 210px
|
||||
|
||||
.image-wrapper
|
||||
left: 0
|
||||
text-align: left
|
||||
|
||||
&:nth-child(1)
|
||||
padding-right: 0
|
||||
|
@ -219,9 +222,8 @@ $feature-box-div-width: 45%
|
|||
footer
|
||||
nav
|
||||
text-align: center
|
||||
|
||||
a
|
||||
width: 30%
|
||||
width: auto
|
||||
padding: 0 20px
|
||||
|
||||
.social
|
||||
|
|
49
config.toml
49
config.toml
|
@ -66,10 +66,10 @@ time_format_blog = "Monday, January 02, 2006"
|
|||
description = "Production-Grade Container Orchestration"
|
||||
showedit = true
|
||||
|
||||
latest = "v1.17"
|
||||
latest = "v1.18"
|
||||
|
||||
fullversion = "v1.17.0"
|
||||
version = "v1.17"
|
||||
fullversion = "v1.18.0"
|
||||
version = "v1.18"
|
||||
githubbranch = "master"
|
||||
docsbranch = "master"
|
||||
deprecated = false
|
||||
|
@ -83,12 +83,6 @@ announcement = false
|
|||
# announcement_message is only displayed when announcement = true; update with your specific message
|
||||
announcement_message = "The Kubernetes Documentation team would like your feedback! Please take a <a href='https://www.surveymonkey.com/r/8R237FN' target='_blank'>short survey</a> so we can improve the Kubernetes online documentation."
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.17.0"
|
||||
version = "v1.17"
|
||||
githubbranch = "v1.17.0"
|
||||
docsbranch = "release-1.17"
|
||||
url = "https://kubernetes.io"
|
||||
|
||||
[params.pushAssets]
|
||||
css = [
|
||||
|
@ -102,33 +96,40 @@ js = [
|
|||
]
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.16.3"
|
||||
fullversion = "v1.18.0"
|
||||
version = "v1.18"
|
||||
githubbranch = "v1.18.0"
|
||||
docsbranch = "release-1.18"
|
||||
url = "https://kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.17.4"
|
||||
version = "v1.17"
|
||||
githubbranch = "v1.17.4"
|
||||
docsbranch = "release-1.17"
|
||||
url = "https://v1-17.docs.kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.16.8"
|
||||
version = "v1.16"
|
||||
githubbranch = "v1.16.3"
|
||||
githubbranch = "v1.16.8"
|
||||
docsbranch = "release-1.16"
|
||||
url = "https://v1-16.docs.kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.15.6"
|
||||
fullversion = "v1.15.11"
|
||||
version = "v1.15"
|
||||
githubbranch = "v1.15.6"
|
||||
githubbranch = "v1.15.11"
|
||||
docsbranch = "release-1.15"
|
||||
url = "https://v1-15.docs.kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.14.9"
|
||||
fullversion = "v1.14.10"
|
||||
version = "v1.14"
|
||||
githubbranch = "v1.14.9"
|
||||
githubbranch = "v1.14.10"
|
||||
docsbranch = "release-1.14"
|
||||
url = "https://v1-14.docs.kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.13.12"
|
||||
version = "v1.13"
|
||||
githubbranch = "v1.13.12"
|
||||
docsbranch = "release-1.13"
|
||||
url = "https://v1-13.docs.kubernetes.io"
|
||||
|
||||
# Language definitions.
|
||||
|
||||
[languages]
|
||||
|
@ -174,7 +175,7 @@ language_alternatives = ["en"]
|
|||
|
||||
[languages.fr]
|
||||
title = "Kubernetes"
|
||||
description = "Production-Grade Container Orchestration"
|
||||
description = "Solution professionnelle d’orchestration de conteneurs"
|
||||
languageName ="Français"
|
||||
weight = 5
|
||||
contentDir = "content/fr"
|
||||
|
@ -222,7 +223,7 @@ language_alternatives = ["en"]
|
|||
|
||||
[languages.es]
|
||||
title = "Kubernetes"
|
||||
description = "Production-Grade Container Orchestration"
|
||||
description = "Orquestación de contenedores para producción"
|
||||
languageName ="Español"
|
||||
weight = 9
|
||||
contentDir = "content/es"
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
<!-- Do not edit this file directly. Get the latest from
|
||||
https://github.com/cncf/foundation/blob/master/code-of-conduct.md -->
|
||||
## CNCF Community Code of Conduct v1.0
|
||||
## CNCF Gemeinschafts-Verhaltenskodex v1.0
|
||||
|
||||
### Verhaltenskodex für Mitwirkende
|
||||
|
||||
|
|
|
@ -52,7 +52,7 @@ Wenn Sie beispielsweise mit der Kubernetes-API ein Deployment-Objekt erstellen,
|
|||
|
||||
### Kubernetes Master
|
||||
|
||||
Der Kubernetes-Master ist für Erhalt des gewünschten Status Ihres Clusters verantwortlich. Wenn Sie mit Kubernetes interagieren, beispielsweise mit dem Kommanduzeilen-Tool `kubectl`, kommunizieren Sie mit dem Kubernetes-Master Ihres Clusters.
|
||||
Der Kubernetes-Master ist für Erhalt des gewünschten Status Ihres Clusters verantwortlich. Wenn Sie mit Kubernetes interagieren, beispielsweise mit dem Kommandozeilen-Tool `kubectl`, kommunizieren Sie mit dem Kubernetes-Master Ihres Clusters.
|
||||
|
||||
> Der Begriff "Master" bezeichnet dabei eine Reihe von Prozessen, die den Clusterstatus verwalten. Normalerweise werden diese Prozesse alle auf einem einzigen Node im Cluster ausgeführt. Dieser Node wird auch als Master bezeichnet. Der Master kann repliziert werden, um die Verfügbarkeit und Redundanz zu erhöhen.
|
||||
|
||||
|
|
|
@ -0,0 +1,56 @@
|
|||
---
|
||||
title: Addons Installieren
|
||||
content_template: templates/concept
|
||||
---
|
||||
|
||||
{{% capture overview %}}
|
||||
|
||||
|
||||
Add-Ons erweitern die Funktionalität von Kubernetes.
|
||||
|
||||
Diese Seite gibt eine Übersicht über einige verfügbare Add-Ons und verweist auf die entsprechenden Installationsanleitungen.
|
||||
|
||||
Die Add-Ons in den einzelnen Kategorien sind alphabetisch sortiert - Die Reihenfolge impliziert keine bevorzugung einzelner Projekte.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
|
||||
{{% capture body %}}
|
||||
|
||||
## Networking und Network Policy
|
||||
|
||||
* [ACI](https://www.github.com/noironetworks/aci-containers) bietet Container-Networking und Network-Security mit Cisco ACI.
|
||||
* [Calico](https://docs.projectcalico.org/latest/introduction/) ist ein Networking- und Network-Policy-Provider. Calico unterstützt eine Reihe von Networking-Optionen, damit Du die richtige für deinen Use-Case wählen kannst. Dies beinhaltet Non-Overlaying and Overlaying-Networks mit oder ohne BGP. Calico nutzt die gleiche Engine um Network-Policies für Hosts, Pods und (falls Du Istio & Envoy benutzt) Anwendungen auf Service-Mesh-Ebene durchzusetzen.
|
||||
* [Canal](https://github.com/tigera/canal/tree/master/k8s-install) vereint Flannel und Calico um Networking- und Network-Policies bereitzustellen.
|
||||
* [Cilium](https://github.com/cilium/cilium) ist ein L3 Network- and Network-Policy-Plugin welches das transparent HTTP/API/L7-Policies durchsetzen kann. Sowohl Routing- als auch Overlay/Encapsulation-Modes werden uterstützt. Außerdem kann Cilium auf andere CNI-Plugins aufsetzen.
|
||||
* [CNI-Genie](https://github.com/Huawei-PaaS/CNI-Genie) ermöglicht das nahtlose Verbinden von Kubernetes mit einer Reihe an CNI-Plugins wie z.B. Calico, Canal, Flannel, Romana, oder Weave.
|
||||
* [Contiv](http://contiv.github.io) bietet konfigurierbares Networking (Native L3 auf BGP, Overlay mit vxlan, Klassisches L2, Cisco-SDN/ACI) für verschiedene Anwendungszwecke und auch umfangreiches Policy-Framework. Das Contiv-Projekt ist vollständig [Open Source](http://github.com/contiv). Der [installer](http://github.com/contiv/install) bietet sowohl kubeadm als auch nicht-kubeadm basierte Installationen.
|
||||
* [Contrail](http://www.juniper.net/us/en/products-services/sdn/contrail/contrail-networking/), basierend auf [Tungsten Fabric](https://tungsten.io), ist eine Open Source, multi-Cloud Netzwerkvirtualisierungs- und Policy-Management Plattform. Contrail und Tungsten Fabric sind mit Orechstratoren wie z.B. Kubernetes, OpenShift, OpenStack und Mesos integriert und bieten Isolationsmodi für Virtuelle Maschinen, Container (bzw. Pods) und Bare Metal workloads.
|
||||
* [Flannel](https://github.com/coreos/flannel/blob/master/Documentation/kubernetes.md) ist ein Overlay-Network-Provider der mit Kubernetes genutzt werden kann.
|
||||
* [Knitter](https://github.com/ZTE/Knitter/) ist eine Network-Lösung die Mehrfach-Network in Kubernetes ermöglicht.
|
||||
* [Multus](https://github.com/Intel-Corp/multus-cni) ist ein Multi-Plugin für Mehrfachnetzwerk-Unterstützung um alle CNI-Plugins (z.B. Calico, Cilium, Contiv, Flannel), zusätzlich zu SRIOV-, DPDK-, OVS-DPDK- und VPP-Basierten Workloads in Kubernetes zu unterstützen.
|
||||
* [NSX-T](https://docs.vmware.com/en/VMware-NSX-T/2.0/nsxt_20_ncp_kubernetes.pdf) Container Plug-in (NCP) bietet eine Integration zwischen VMware NSX-T und einem Orchestator wie z.B. Kubernetes. Außerdem bietet es eine Integration zwischen NSX-T und Containerbasierten CaaS/PaaS-Plattformen wie z.B. Pivotal Container Service (PKS) und OpenShift.
|
||||
* [Nuage](https://github.com/nuagenetworks/nuage-kubernetes/blob/v5.1.1-1/docs/kubernetes-1-installation.rst) ist eine SDN-Plattform die Policy-Basiertes Networking zwischen Kubernetes Pods und nicht-Kubernetes Umgebungen inklusive Sichtbarkeit und Security-Monitoring bereitstellt.
|
||||
* [Romana](http://romana.io) ist eine Layer 3 Network-Lösung für Pod-Netzwerke welche auch die [NetworkPolicy API](/docs/concepts/services-networking/network-policies/) unterstützt. Details zur Installation als kubeadm Add-On sind [hier](https://github.com/romana/romana/tree/master/containerize) verfügbar.
|
||||
* [Weave Net](https://www.weave.works/docs/net/latest/kube-addon/) bietet Networking and Network-Policies und arbeitet auf beiden Seiten der Network-Partition ohne auf eine externe Datenbank angwiesen zu sein.
|
||||
|
||||
## Service-Discovery
|
||||
|
||||
* [CoreDNS](https://coredns.io) ist ein flexibler, erweiterbarer DNS-Server der in einem Cluster [installiert](https://github.com/coredns/deployment/tree/master/kubernetes) werden kann und das Cluster-interne DNS für Pods bereitzustellen.
|
||||
|
||||
## Visualisierung & Überwachung
|
||||
|
||||
* [Dashboard](https://github.com/kubernetes/dashboard#kubernetes-dashboard) ist ein Dashboard Web Interface für Kubernetes.
|
||||
* [Weave Scope](https://www.weave.works/documentation/scope-latest-installing/#k8s) ist ein Tool um Container, Pods, Services usw. Grafisch zu visualieren. Kann in Verbindung mit einem [Weave Cloud Account](https://cloud.weave.works/) genutzt oder selbst gehosted werden.
|
||||
|
||||
## Infrastruktur
|
||||
|
||||
* [KubeVirt](https://kubevirt.io/user-guide/docs/latest/administration/intro.html#cluster-side-add-on-deployment) ist ein Add-On um Virtuelle Maschinen in Kubernetes auszuführen. Wird typischer auf Bare-Metal Clustern eingesetzt.
|
||||
|
||||
## Legacy Add-Ons
|
||||
|
||||
Es gibt einige weitere Add-Ons die in dem abgekündigten [cluster/addons](https://git.k8s.io/kubernetes/cluster/addons)-Verzeichnis dokumentiert sind.
|
||||
|
||||
Add-Ons die ordentlich gewartet werden dürfen gerne hier aufgezählt werden. Wir freuen uns auf PRs!
|
||||
|
||||
{{% /capture %}}
|
|
@ -96,7 +96,7 @@ Das Google service Konto der Instanz hat einen `https://www.googleapis.com/auth/
|
|||
|
||||
Kubernetes eine native Unterstützung für die [Amazon Elastic Container Registry](https://aws.amazon.com/ecr/) wenn Knoten AWS EC2 Instanzen sind.
|
||||
|
||||
Es muss einfah nur der komplette Image Name (z.B. `ACCOUNT.dkr.ecr.REGION.amazonaws.com/imagename:tag`) in der Pod - Definition genutzt werden.
|
||||
Es muss einfach nur der komplette Image Name (z.B. `ACCOUNT.dkr.ecr.REGION.amazonaws.com/imagename:tag`) in der Pod - Definition genutzt werden.
|
||||
|
||||
Alle Benutzer eines Clusters die Pods erstellen dürfen können dann jedes der Images in der ECR Registry zum Ausführen von Pods nutzen.
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ title: Kubernetes Dokumentation
|
|||
noedit: true
|
||||
cid: docsHome
|
||||
layout: docsportal_home
|
||||
class: gridPage
|
||||
class: gridPage gridPageHome
|
||||
linkTitle: "Home"
|
||||
main_menu: true
|
||||
weight: 10
|
||||
|
|
|
@ -15,5 +15,5 @@ tags:
|
|||
|
||||
<!--more-->
|
||||
|
||||
Halten Sie immer einen Sicherungsplan für etcds Daten für Ihren Kubernetes-Cluster bereit. Ausführliche Informationen zu etcd finden Sie in der [etcd Dokumentation](https://github.com/coreos/etcd/blob/master/Documentation/docs.md).
|
||||
Halten Sie immer einen Sicherungsplan für etcds Daten für Ihren Kubernetes-Cluster bereit. Ausführliche Informationen zu etcd finden Sie in der [etcd Dokumentation](https://etcd.io/docs).
|
||||
|
||||
|
|
|
@ -27,7 +27,7 @@ source <(kubectl completion bash) # Wenn Sie autocomplete in bash in der aktuell
|
|||
echo "source <(kubectl completion bash)" >> ~/.bashrc # Fügen Sie der Bash-Shell dauerhaft Autocomplete hinzu.
|
||||
```
|
||||
|
||||
Sie können auch ein Abkürzungsalias für `kubectl` verwenden, weleches auch mit Vervollständigung funktioniert:
|
||||
Sie können auch ein Abkürzungsalias für `kubectl` verwenden, welches auch mit Vervollständigung funktioniert:
|
||||
|
||||
```bash
|
||||
alias k=kubectl
|
||||
|
@ -180,7 +180,7 @@ kubectl get events --sort-by=.metadata.creationTimestamp
|
|||
|
||||
## Ressourcen aktualisieren
|
||||
|
||||
Ab Version 1.11 ist das `rolling-update` veraltet (Lesen Sie [CHANGELOG-1.11.md](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md) für weitere Informationen), verwenden Sie stattdessen `rollout`.
|
||||
Ab Version 1.11 ist das `rolling-update` veraltet (Lesen Sie [CHANGELOG-1.11.md](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.11.md) für weitere Informationen), verwenden Sie stattdessen `rollout`.
|
||||
|
||||
```bash
|
||||
kubectl set image deployment/frontend www=image:v2 # Fortlaufende Aktualisierung der "www" Container der "Frontend"-Bereitstellung, Aktualisierung des Images
|
||||
|
|
|
@ -205,7 +205,7 @@ Weitere Informationen zu unterstützten Treibern und zur Installation von Plugin
|
|||
|
||||
### Lokale Images durch erneute Verwendung des Docker-Daemon ausführen
|
||||
|
||||
Wenn Sie eine einzige Kubernetes VM verwenden, ist es sehr praktisch, den integrierten Docker-Daemon von Minikube wiederzuverwenden; Dies bedeutet, dass Sie auf Ihrem lokalen Computer keine Docker-Registy erstellen und das Image in die Registry importortieren müssen - Sie können einfach innerhalb desselben Docker-Daemons wie Minikube arbeiten, was lokale Experimente beschleunigt. Stellen Sie einfach sicher, dass Sie Ihr Docker-Image mit einem anderen Element als 'latest' versehen, und verwenden Sie dieses Tag, wenn Sie das Image laden. Andernfalls, wenn Sie keine Version Ihres Images angeben, wird es als `:latest` angenommen, mit der Pull-Image-Richtlinie von `Always` entsprechend, was schließlich zu `ErrImagePull` führen kann, da Sie möglicherweise noch keine Versionen Ihres Docker-Images in der Standard-Docker-Registry (normalerweise DockerHub) haben.
|
||||
Wenn Sie eine einzige Kubernetes VM verwenden, ist es sehr praktisch, den integrierten Docker-Daemon von Minikube wiederzuverwenden; Dies bedeutet, dass Sie auf Ihrem lokalen Computer keine Docker-Registy erstellen und das Image in die Registry importieren müssen - Sie können einfach innerhalb desselben Docker-Daemons wie Minikube arbeiten, was lokale Experimente beschleunigt. Stellen Sie einfach sicher, dass Sie Ihr Docker-Image mit einem anderen Element als 'latest' versehen, und verwenden Sie dieses Tag, wenn Sie das Image laden. Andernfalls, wenn Sie keine Version Ihres Images angeben, wird es als `:latest` angenommen, mit der Pull-Image-Richtlinie von `Always` entsprechend, was schließlich zu `ErrImagePull` führen kann, da Sie möglicherweise noch keine Versionen Ihres Docker-Images in der Standard-Docker-Registry (normalerweise DockerHub) haben.
|
||||
|
||||
Um mit dem Docker-Daemon auf Ihrem Mac/Linux-Computer arbeiten zu können, verwenden Sie den `docker-env`-Befehl in Ihrer Shell:
|
||||
|
||||
|
|
|
@ -49,7 +49,7 @@ Minikube unterstützt auch die Option `--vm-driver=none`, mit der die Kubernetes
|
|||
Die einfachste Möglichkeit, Minikube unter macOS zu installieren, ist die Verwendung von [Homebrew](https://brew.sh):
|
||||
|
||||
```shell
|
||||
brew cask install minikube
|
||||
brew install minikube
|
||||
```
|
||||
|
||||
Sie können es auch auf macOS installieren, indem Sie eine statische Binärdatei herunterladen:
|
||||
|
|
|
@ -145,7 +145,7 @@ Um den "Hallo-Welt"-Container außerhalb des virtuellen Netzwerks von Kubernetes
|
|||
```
|
||||
|
||||
Bei Cloud-Anbietern, die Load-Balancer unterstützen, wird eine externe IP-Adresse für den Zugriff auf den Dienst bereitgestellt.
|
||||
Bei Minikube ermöglicht der Typ `LoadBalancer` den Dienst über den Befehl `minikube service` verfuügbar zu machen.
|
||||
Bei Minikube ermöglicht der Typ `LoadBalancer` den Dienst über den Befehl `minikube service` verfügbar zu machen.
|
||||
|
||||
|
||||
3. Führen Sie den folgenden Befehl aus:
|
||||
|
|
|
@ -45,12 +45,12 @@ Kubernetes is open source giving you the freedom to take advantage of on-premise
|
|||
<br>
|
||||
<br>
|
||||
<br>
|
||||
<a href="https://events.linuxfoundation.org/events/kubecon-cloudnativecon-europe-2020/" button id="desktopKCButton">Attend KubeCon in Amsterdam on Mar. 30-Apr. 2, 2020</a>
|
||||
<a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/?utm_source=kubernetes.io&utm_medium=nav&utm_campaign=kccnceu20" button id="desktopKCButton">Attend KubeCon in Amsterdam on August 13-16, 2020</a>
|
||||
<br>
|
||||
<br>
|
||||
<br>
|
||||
<br>
|
||||
<a href="https://events.linuxfoundation.cn/kubecon-cloudnativecon-open-source-summit-china/" button id="desktopKCButton">Attend KubeCon in Shanghai on July 28-30, 2020</a>
|
||||
<a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/?utm_source=kubernetes.io&utm_medium=nav&utm_campaign=kccncna20" button id="desktopKCButton">Attend KubeCon in Boston on November 17-20, 2020</a>
|
||||
</div>
|
||||
<div id="videoPlayer">
|
||||
<iframe data-url="https://www.youtube.com/embed/H06qrNmGqyE?autoplay=1" frameborder="0" allowfullscreen></iframe>
|
||||
|
|
|
@ -97,3 +97,11 @@ semantics to fields! It's also going to improve support for CRDs and unions!
|
|||
- Some kubectl apply features are missing from diff and could be useful, like the ability
|
||||
to filter by label, or to display pruned resources.
|
||||
- Eventually, kubectl diff will use server-side apply!
|
||||
|
||||
{{< note >}}
|
||||
|
||||
The flag `kubectl apply --server-dry-run` is deprecated in v1.18.
|
||||
Use the flag `--dry-run=server` for using server-side dry-run in
|
||||
`kubectl apply` and other subcommands.
|
||||
|
||||
{{< /note >}}
|
||||
|
|
|
@ -52,7 +52,7 @@ In this way, admission controllers and policy management help make sure that app
|
|||
|
||||
To illustrate how admission controller webhooks can be leveraged to establish custom security policies, let’s consider an example that addresses one of the shortcomings of Kubernetes: a lot of its defaults are optimized for ease of use and reducing friction, sometimes at the expense of security. One of these settings is that containers are by default allowed to run as root (and, without further configuration and no `USER` directive in the Dockerfile, will also do so). Even though containers are isolated from the underlying host to a certain extent, running containers as root does increase the risk profile of your deployment— and should be avoided as one of many [security best practices](https://www.stackrox.com/post/2018/12/6-container-security-best-practices-you-should-be-following/). The [recently exposed runC vulnerability](https://www.stackrox.com/post/2019/02/the-runc-vulnerability-a-deep-dive-on-protecting-yourself/) ([CVE-2019-5736](https://nvd.nist.gov/vuln/detail/CVE-2019-5736)), for example, could be exploited only if the container ran as root.
|
||||
|
||||
You can use a custom mutating admission controller webhook to apply more secure defaults: unless explicitly requested, our webhook will ensure that pods run as a non-root user (we assign the user ID 1234 if no explicit assignment has been made). Note that this setup does not prevent you from deploying any workloads in your cluster, including those that legitimately require running as root. It only requires you to explicitly enable this risker mode of operation in the deployment configuration, while defaulting to non-root mode for all other workloads.
|
||||
You can use a custom mutating admission controller webhook to apply more secure defaults: unless explicitly requested, our webhook will ensure that pods run as a non-root user (we assign the user ID 1234 if no explicit assignment has been made). Note that this setup does not prevent you from deploying any workloads in your cluster, including those that legitimately require running as root. It only requires you to explicitly enable this riskier mode of operation in the deployment configuration, while defaulting to non-root mode for all other workloads.
|
||||
|
||||
The full code along with deployment instructions can be found in our accompanying [GitHub repository](https://github.com/stackrox/admission-controller-webhook-demo). Here, we will highlight a few of the more subtle aspects about how webhooks work.
|
||||
|
||||
|
@ -80,7 +80,7 @@ webhooks:
|
|||
resources: ["pods"]
|
||||
```
|
||||
|
||||
This configuration defines a `webhook webhook-server.webhook-demo.svc`, and instructs the Kubernetes API server to consult the service `webhook-server` in n`amespace webhook-demo` whenever a pod is created by making a HTTP POST request to the `/mutate` URL. For this configuration to work, several prerequisites have to be met.
|
||||
This configuration defines a `webhook webhook-server.webhook-demo.svc`, and instructs the Kubernetes API server to consult the service `webhook-server` in `namespace webhook-demo` whenever a pod is created by making a HTTP POST request to the `/mutate` URL. For this configuration to work, several prerequisites have to be met.
|
||||
|
||||
## Webhook REST API
|
||||
|
||||
|
|
|
@ -12,21 +12,45 @@ When APIs evolve, the old API is deprecated and eventually removed.
|
|||
|
||||
The **v1.16** release will stop serving the following deprecated API versions in favor of newer and more stable API versions:
|
||||
|
||||
* NetworkPolicy (in the **extensions/v1beta1** API group)
|
||||
* Migrate to use the **networking.k8s.io/v1** API, available since v1.8.
|
||||
Existing persisted data can be retrieved/updated via the **networking.k8s.io/v1** API.
|
||||
* PodSecurityPolicy (in the **extensions/v1beta1** API group)
|
||||
* NetworkPolicy in the **extensions/v1beta1** API version is no longer served
|
||||
* Migrate to use the **networking.k8s.io/v1** API version, available since v1.8.
|
||||
Existing persisted data can be retrieved/updated via the new version.
|
||||
* PodSecurityPolicy in the **extensions/v1beta1** API version
|
||||
* Migrate to use the **policy/v1beta1** API, available since v1.10.
|
||||
Existing persisted data can be retrieved/updated via the **policy/v1beta1** API.
|
||||
* DaemonSet, Deployment, StatefulSet, and ReplicaSet (in the **extensions/v1beta1** and **apps/v1beta2** API groups)
|
||||
* Migrate to use the **apps/v1** API, available since v1.9.
|
||||
Existing persisted data can be retrieved/updated via the **apps/v1** API.
|
||||
Existing persisted data can be retrieved/updated via the new version.
|
||||
* DaemonSet in the **extensions/v1beta1** and **apps/v1beta2** API versions is no longer served
|
||||
* Migrate to use the **apps/v1** API version, available since v1.9.
|
||||
Existing persisted data can be retrieved/updated via the new version.
|
||||
* Notable changes:
|
||||
* `spec.templateGeneration` is removed
|
||||
* `spec.selector` is now required and immutable after creation; use the existing template labels as the selector for seamless upgrades
|
||||
* `spec.updateStrategy.type` now defaults to `RollingUpdate` (the default in `extensions/v1beta1` was `OnDelete`)
|
||||
* Deployment in the **extensions/v1beta1**, **apps/v1beta1**, and **apps/v1beta2** API versions is no longer served
|
||||
* Migrate to use the **apps/v1** API version, available since v1.9.
|
||||
Existing persisted data can be retrieved/updated via the new version.
|
||||
* Notable changes:
|
||||
* `spec.rollbackTo` is removed
|
||||
* `spec.selector` is now required and immutable after creation; use the existing template labels as the selector for seamless upgrades
|
||||
* `spec.progressDeadlineSeconds` now defaults to `600` seconds (the default in `extensions/v1beta1` was no deadline)
|
||||
* `spec.revisionHistoryLimit` now defaults to `10` (the default in `apps/v1beta1` was `2`, the default in `extensions/v1beta1` was to retain all)
|
||||
* `maxSurge` and `maxUnavailable` now default to `25%` (the default in `extensions/v1beta1` was `1`)
|
||||
* StatefulSet in the **apps/v1beta1** and **apps/v1beta2** API versions is no longer served
|
||||
* Migrate to use the **apps/v1** API version, available since v1.9.
|
||||
Existing persisted data can be retrieved/updated via the new version.
|
||||
* Notable changes:
|
||||
* `spec.selector` is now required and immutable after creation; use the existing template labels as the selector for seamless upgrades
|
||||
* `spec.updateStrategy.type` now defaults to `RollingUpdate` (the default in `apps/v1beta1` was `OnDelete`)
|
||||
* ReplicaSet in the **extensions/v1beta1**, **apps/v1beta1**, and **apps/v1beta2** API versions is no longer served
|
||||
* Migrate to use the **apps/v1** API version, available since v1.9.
|
||||
Existing persisted data can be retrieved/updated via the new version.
|
||||
* Notable changes:
|
||||
* `spec.selector` is now required and immutable after creation; use the existing template labels as the selector for seamless upgrades
|
||||
|
||||
The **v1.20** release will stop serving the following deprecated API versions in favor of newer and more stable API versions:
|
||||
The **v1.22** release will stop serving the following deprecated API versions in favor of newer and more stable API versions:
|
||||
|
||||
* Ingress (in the **extensions/v1beta1** API group)
|
||||
* Migrate to use the **networking.k8s.io/v1beta1** API, serving Ingress since v1.14.
|
||||
Existing persisted data can be retrieved/updated via the **networking.k8s.io/v1beta1** API.
|
||||
* Ingress in the **extensions/v1beta1** API version will no longer be served
|
||||
* Migrate to use the **networking.k8s.io/v1beta1** API version, available since v1.14.
|
||||
Existing persisted data can be retrieved/updated via the new version.
|
||||
|
||||
# What To Do
|
||||
|
||||
|
@ -60,8 +84,8 @@ apiserver startup arguments:
|
|||
|
||||
Deprecations are announced in the Kubernetes release notes. You can see these
|
||||
announcements in
|
||||
[1.14](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#deprecations)
|
||||
and [1.15](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#deprecations-and-removals).
|
||||
[1.14](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.14.md#deprecations)
|
||||
and [1.15](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.15.md#deprecations-and-removals).
|
||||
|
||||
You can read more [in our deprecation policy document](https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-parts-of-the-api)
|
||||
about the deprecation policies for Kubernetes APIs, and other Kubernetes components.
|
||||
|
|
|
@ -186,7 +186,7 @@ metadata:
|
|||
spec:
|
||||
containers:
|
||||
- name: nginx
|
||||
image: nginx:1.13-alpine
|
||||
image: nginx:1.16-alpine
|
||||
ports:
|
||||
- containerPort: 80
|
||||
volumeMounts:
|
||||
|
|
|
@ -0,0 +1,760 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Deploying External OpenStack Cloud Provider with Kubeadm"
|
||||
date: 2020-02-07
|
||||
slug: Deploying-External-OpenStack-Cloud-Provider-with-Kubeadm
|
||||
---
|
||||
This document describes how to install a single control-plane Kubernetes cluster v1.15 with kubeadm on CentOS, and then deploy an external OpenStack cloud provider and Cinder CSI plugin to use Cinder volumes as persistent volumes in Kubernetes.
|
||||
|
||||
### Preparation in OpenStack
|
||||
|
||||
This cluster runs on OpenStack VMs, so let's create a few things in OpenStack first.
|
||||
|
||||
* A project/tenant for this Kubernetes cluster
|
||||
* A user in this project for Kubernetes, to query node information and attach volumes etc
|
||||
* A private network and subnet
|
||||
* A router for this private network and connect it to a public network for floating IPs
|
||||
* A security group for all Kubernetes VMs
|
||||
* A VM as a control-plane node and a few VMs as worker nodes
|
||||
|
||||
The security group will have the following rules to open ports for Kubernetes.
|
||||
|
||||
**Control-Plane Node**
|
||||
|
||||
|Protocol | Port Number | Description|
|
||||
|----------|-------------|------------|
|
||||
|TCP |6443|Kubernetes API Server|
|
||||
|TCP|2379-2380|etcd server client API|
|
||||
|TCP|10250|Kubelet API|
|
||||
|TCP|10251|kube-scheduler|
|
||||
|TCP|10252|kube-controller-manager|
|
||||
|TCP|10255|Read-only Kubelet API|
|
||||
|
||||
**Worker Nodes**
|
||||
|
||||
|Protocol | Port Number | Description|
|
||||
|----------|-------------|------------|
|
||||
|TCP|10250|Kubelet API|
|
||||
|TCP|10255|Read-only Kubelet API|
|
||||
|TCP|30000-32767|NodePort Services|
|
||||
|
||||
**CNI ports on both control-plane and worker nodes**
|
||||
|
||||
|Protocol | Port Number | Description|
|
||||
|----------|-------------|------------|
|
||||
|TCP|179|Calico BGP network|
|
||||
|TCP|9099|Calico felix (health check)|
|
||||
|UDP|8285|Flannel|
|
||||
|UDP|8472|Flannel|
|
||||
|TCP|6781-6784|Weave Net|
|
||||
|UDP|6783-6784|Weave Net|
|
||||
|
||||
CNI specific ports are only required to be opened when that particular CNI plugin is used. In this guide, we will use Weave Net. Only the Weave Net ports (TCP 6781-6784 and UDP 6783-6784), will need to be opened in the security group.
|
||||
|
||||
The control-plane node needs at least 2 cores and 4GB RAM. After the VM is launched, verify its hostname and make sure it is the same as the node name in Nova.
|
||||
If the hostname is not resolvable, add it to `/etc/hosts`.
|
||||
|
||||
For example, if the VM is called master1, and it has an internal IP 192.168.1.4. Add that to `/etc/hosts` and set hostname to master1.
|
||||
```shell
|
||||
echo "192.168.1.4 master1" >> /etc/hosts
|
||||
|
||||
hostnamectl set-hostname master1
|
||||
```
|
||||
### Install Docker and Kubernetes
|
||||
|
||||
Next, we'll follow the official documents to install docker and Kubernetes using kubeadm.
|
||||
|
||||
Install Docker following the steps from the [container runtime](/docs/setup/production-environment/container-runtimes/) documentation.
|
||||
|
||||
Note that it is a [best practice to use systemd as the cgroup driver](/docs/setup/production-environment/container-runtimes/#cgroup-drivers) for Kubernetes.
|
||||
If you use an internal container registry, add them to the docker config.
|
||||
```shell
|
||||
# Install Docker CE
|
||||
## Set up the repository
|
||||
### Install required packages.
|
||||
|
||||
yum install yum-utils device-mapper-persistent-data lvm2
|
||||
|
||||
### Add Docker repository.
|
||||
|
||||
yum-config-manager \
|
||||
--add-repo \
|
||||
https://download.docker.com/linux/centos/docker-ce.repo
|
||||
|
||||
## Install Docker CE.
|
||||
|
||||
yum update && yum install docker-ce-18.06.2.ce
|
||||
|
||||
## Create /etc/docker directory.
|
||||
|
||||
mkdir /etc/docker
|
||||
|
||||
# Configure the Docker daemon
|
||||
|
||||
cat > /etc/docker/daemon.json <<EOF
|
||||
{
|
||||
"exec-opts": ["native.cgroupdriver=systemd"],
|
||||
"log-driver": "json-file",
|
||||
"log-opts": {
|
||||
"max-size": "100m"
|
||||
},
|
||||
"storage-driver": "overlay2",
|
||||
"storage-opts": [
|
||||
"overlay2.override_kernel_check=true"
|
||||
]
|
||||
}
|
||||
EOF
|
||||
|
||||
mkdir -p /etc/systemd/system/docker.service.d
|
||||
|
||||
# Restart Docker
|
||||
systemctl daemon-reload
|
||||
systemctl restart docker
|
||||
systemctl enable docker
|
||||
```
|
||||
|
||||
Install kubeadm following the steps from the [Installing Kubeadm](/docs/setup/production-environment/tools/kubeadm/install-kubeadm/) documentation.
|
||||
|
||||
```shell
|
||||
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
|
||||
[kubernetes]
|
||||
name=Kubernetes
|
||||
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
repo_gpgcheck=1
|
||||
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
|
||||
EOF
|
||||
|
||||
# Set SELinux in permissive mode (effectively disabling it)
|
||||
# Caveat: In a production environment you may not want to disable SELinux, please refer to Kubernetes documents about SELinux
|
||||
setenforce 0
|
||||
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
|
||||
|
||||
yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
|
||||
|
||||
systemctl enable --now kubelet
|
||||
|
||||
cat <<EOF > /etc/sysctl.d/k8s.conf
|
||||
net.bridge.bridge-nf-call-ip6tables = 1
|
||||
net.bridge.bridge-nf-call-iptables = 1
|
||||
EOF
|
||||
sysctl --system
|
||||
|
||||
# check if br_netfilter module is loaded
|
||||
lsmod | grep br_netfilter
|
||||
|
||||
# if not, load it explicitly with
|
||||
modprobe br_netfilter
|
||||
```
|
||||
|
||||
The official document about how to create a single control-plane cluster can be found from the [Creating a single control-plane cluster with kubeadm](/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/) documentation.
|
||||
|
||||
We'll largely follow that document but also add additional things for the cloud provider.
|
||||
To make things more clear, we'll use a `kubeadm-config.yml` for the control-plane node.
|
||||
In this config we specify to use an external OpenStack cloud provider, and where to find its config.
|
||||
We also enable storage API in API server's runtime config so we can use OpenStack volumes as persistent volumes in Kubernetes.
|
||||
|
||||
```yaml
|
||||
apiVersion: kubeadm.k8s.io/v1beta1
|
||||
kind: InitConfiguration
|
||||
nodeRegistration:
|
||||
kubeletExtraArgs:
|
||||
cloud-provider: "external"
|
||||
---
|
||||
apiVersion: kubeadm.k8s.io/v1beta2
|
||||
kind: ClusterConfiguration
|
||||
kubernetesVersion: "v1.15.1"
|
||||
apiServer:
|
||||
extraArgs:
|
||||
enable-admission-plugins: NodeRestriction
|
||||
runtime-config: "storage.k8s.io/v1=true"
|
||||
controllerManager:
|
||||
extraArgs:
|
||||
external-cloud-volume-plugin: openstack
|
||||
extraVolumes:
|
||||
- name: "cloud-config"
|
||||
hostPath: "/etc/kubernetes/cloud-config"
|
||||
mountPath: "/etc/kubernetes/cloud-config"
|
||||
readOnly: true
|
||||
pathType: File
|
||||
networking:
|
||||
serviceSubnet: "10.96.0.0/12"
|
||||
podSubnet: "10.224.0.0/16"
|
||||
dnsDomain: "cluster.local"
|
||||
```
|
||||
|
||||
Now we'll create the cloud config, `/etc/kubernetes/cloud-config`, for OpenStack.
|
||||
Note that the tenant here is the one we created for all Kubernetes VMs in the beginning.
|
||||
All VMs should be launched in this project/tenant.
|
||||
In addition you need to create a user in this tenant for Kubernetes to do queries.
|
||||
The ca-file is the CA root certificate for OpenStack's API endpoint, for example `https://openstack.cloud:5000/v3`
|
||||
At the time of writing the cloud provider doesn't allow insecure connections (skip CA check).
|
||||
|
||||
```ini
|
||||
[Global]
|
||||
region=RegionOne
|
||||
username=username
|
||||
password=password
|
||||
auth-url=https://openstack.cloud:5000/v3
|
||||
tenant-id=14ba698c0aec4fd6b7dc8c310f664009
|
||||
domain-id=default
|
||||
ca-file=/etc/kubernetes/ca.pem
|
||||
|
||||
[LoadBalancer]
|
||||
subnet-id=b4a9a292-ea48-4125-9fb2-8be2628cb7a1
|
||||
floating-network-id=bc8a590a-5d65-4525-98f3-f7ef29c727d5
|
||||
|
||||
[BlockStorage]
|
||||
bs-version=v2
|
||||
|
||||
[Networking]
|
||||
public-network-name=public
|
||||
ipv6-support-disabled=false
|
||||
```
|
||||
|
||||
Next run kubeadm to initiate the control-plane node
|
||||
```shell
|
||||
kubeadm init --config=kubeadm-config.yml
|
||||
```
|
||||
|
||||
With the initialization completed, copy admin config to .kube
|
||||
```shell
|
||||
mkdir -p $HOME/.kube
|
||||
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
|
||||
sudo chown $(id -u):$(id -g) $HOME/.kube/config
|
||||
```
|
||||
|
||||
At this stage, the control-plane node is created but not ready. All the nodes have the taint `node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule` and are waiting to be initialized by the cloud-controller-manager.
|
||||
```console
|
||||
# kubectl describe no master1
|
||||
Name: master1
|
||||
Roles: master
|
||||
......
|
||||
Taints: node-role.kubernetes.io/master:NoSchedule
|
||||
node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
|
||||
node.kubernetes.io/not-ready:NoSchedule
|
||||
......
|
||||
```
|
||||
Now deploy the OpenStack cloud controller manager into the cluster, following [using controller manager with kubeadm](https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-controller-manager-with-kubeadm.md).
|
||||
|
||||
Create a secret with the cloud-config for the openstack cloud provider.
|
||||
```shell
|
||||
kubectl create secret -n kube-system generic cloud-config --from-literal=cloud.conf="$(cat /etc/kubernetes/cloud-config)" --dry-run -o yaml > cloud-config-secret.yaml
|
||||
kubectl apply -f cloud-config-secret.yaml
|
||||
```
|
||||
|
||||
Get the CA certificate for OpenStack API endpoints and put that into `/etc/kubernetes/ca.pem`.
|
||||
|
||||
Create RBAC resources.
|
||||
```shell
|
||||
kubectl apply -f https://github.com/kubernetes/cloud-provider-openstack/raw/release-1.15/cluster/addons/rbac/cloud-controller-manager-roles.yaml
|
||||
kubectl apply -f https://github.com/kubernetes/cloud-provider-openstack/raw/release-1.15/cluster/addons/rbac/cloud-controller-manager-role-bindings.yaml
|
||||
```
|
||||
|
||||
We'll run the OpenStack cloud controller manager as a DaemonSet rather than a pod.
|
||||
The manager will only run on the control-plane node, so if there are multiple control-plane nodes, multiple pods will be run for high availability.
|
||||
Create `openstack-cloud-controller-manager-ds.yaml` containing the following manifests, then apply it.
|
||||
|
||||
```yaml
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: cloud-controller-manager
|
||||
namespace: kube-system
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: openstack-cloud-controller-manager
|
||||
namespace: kube-system
|
||||
labels:
|
||||
k8s-app: openstack-cloud-controller-manager
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
k8s-app: openstack-cloud-controller-manager
|
||||
updateStrategy:
|
||||
type: RollingUpdate
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
k8s-app: openstack-cloud-controller-manager
|
||||
spec:
|
||||
nodeSelector:
|
||||
node-role.kubernetes.io/master: ""
|
||||
securityContext:
|
||||
runAsUser: 1001
|
||||
tolerations:
|
||||
- key: node.cloudprovider.kubernetes.io/uninitialized
|
||||
value: "true"
|
||||
effect: NoSchedule
|
||||
- key: node-role.kubernetes.io/master
|
||||
effect: NoSchedule
|
||||
- effect: NoSchedule
|
||||
key: node.kubernetes.io/not-ready
|
||||
serviceAccountName: cloud-controller-manager
|
||||
containers:
|
||||
- name: openstack-cloud-controller-manager
|
||||
image: docker.io/k8scloudprovider/openstack-cloud-controller-manager:v1.15.0
|
||||
args:
|
||||
- /bin/openstack-cloud-controller-manager
|
||||
- --v=1
|
||||
- --cloud-config=$(CLOUD_CONFIG)
|
||||
- --cloud-provider=openstack
|
||||
- --use-service-account-credentials=true
|
||||
- --address=127.0.0.1
|
||||
volumeMounts:
|
||||
- mountPath: /etc/kubernetes/pki
|
||||
name: k8s-certs
|
||||
readOnly: true
|
||||
- mountPath: /etc/ssl/certs
|
||||
name: ca-certs
|
||||
readOnly: true
|
||||
- mountPath: /etc/config
|
||||
name: cloud-config-volume
|
||||
readOnly: true
|
||||
- mountPath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
|
||||
name: flexvolume-dir
|
||||
- mountPath: /etc/kubernetes
|
||||
name: ca-cert
|
||||
readOnly: true
|
||||
resources:
|
||||
requests:
|
||||
cpu: 200m
|
||||
env:
|
||||
- name: CLOUD_CONFIG
|
||||
value: /etc/config/cloud.conf
|
||||
hostNetwork: true
|
||||
volumes:
|
||||
- hostPath:
|
||||
path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
|
||||
type: DirectoryOrCreate
|
||||
name: flexvolume-dir
|
||||
- hostPath:
|
||||
path: /etc/kubernetes/pki
|
||||
type: DirectoryOrCreate
|
||||
name: k8s-certs
|
||||
- hostPath:
|
||||
path: /etc/ssl/certs
|
||||
type: DirectoryOrCreate
|
||||
name: ca-certs
|
||||
- name: cloud-config-volume
|
||||
secret:
|
||||
secretName: cloud-config
|
||||
- name: ca-cert
|
||||
secret:
|
||||
secretName: openstack-ca-cert
|
||||
```
|
||||
|
||||
When the controller manager is running, it will query OpenStack to get information about the nodes and remove the taint. In the node info you'll see the VM's UUID in OpenStack.
|
||||
```console
|
||||
# kubectl describe no master1
|
||||
Name: master1
|
||||
Roles: master
|
||||
......
|
||||
Taints: node-role.kubernetes.io/master:NoSchedule
|
||||
node.kubernetes.io/not-ready:NoSchedule
|
||||
......
|
||||
sage:docker: network plugin is not ready: cni config uninitialized
|
||||
......
|
||||
PodCIDR: 10.224.0.0/24
|
||||
ProviderID: openstack:///548e3c46-2477-4ce2-968b-3de1314560a5
|
||||
|
||||
```
|
||||
Now install your favourite CNI and the control-plane node will become ready.
|
||||
|
||||
For example, to install Weave Net, run this command:
|
||||
```shell
|
||||
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
|
||||
```
|
||||
|
||||
Next we'll set up worker nodes.
|
||||
|
||||
Firstly, install docker and kubeadm in the same way as how they were installed in the control-plane node.
|
||||
To join them to the cluster we need a token and ca cert hash from the output of control-plane node installation.
|
||||
If it is expired or lost we can recreate it using these commands.
|
||||
|
||||
```shell
|
||||
# check if token is expired
|
||||
kubeadm token list
|
||||
|
||||
# re-create token and show join command
|
||||
kubeadm token create --print-join-command
|
||||
|
||||
```
|
||||
|
||||
Create `kubeadm-config.yml` for worker nodes with the above token and ca cert hash.
|
||||
```yaml
|
||||
apiVersion: kubeadm.k8s.io/v1beta2
|
||||
discovery:
|
||||
bootstrapToken:
|
||||
apiServerEndpoint: 192.168.1.7:6443
|
||||
token: 0c0z4p.dnafh6vnmouus569
|
||||
caCertHashes: ["sha256:fcb3e956a6880c05fc9d09714424b827f57a6fdc8afc44497180905946527adf"]
|
||||
kind: JoinConfiguration
|
||||
nodeRegistration:
|
||||
kubeletExtraArgs:
|
||||
cloud-provider: "external"
|
||||
|
||||
```
|
||||
apiServerEndpoint is the control-plane node, token and caCertHashes can be taken from the join command printed in the output of 'kubeadm token create' command.
|
||||
|
||||
Run kubeadm and the worker nodes will be joined to the cluster.
|
||||
```shell
|
||||
kubeadm join --config kubeadm-config.yml
|
||||
```
|
||||
|
||||
At this stage we'll have a working Kubernetes cluster with an external OpenStack cloud provider.
|
||||
The provider tells Kubernetes about the mapping between Kubernetes nodes and OpenStack VMs.
|
||||
If Kubernetes wants to attach a persistent volume to a pod, it can find out which OpenStack VM the pod is running on from the mapping, and attach the underlying OpenStack volume to the VM accordingly.
|
||||
|
||||
### Deploy Cinder CSI
|
||||
|
||||
The integration with Cinder is provided by an external Cinder CSI plugin, as described in the [Cinder CSI](https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-cinder-csi-plugin.md) documentation.
|
||||
|
||||
We'll perform the following steps to install the Cinder CSI plugin.
|
||||
Firstly, create a secret with CA certs for OpenStack's API endpoints. It is the same cert file as what we use in cloud provider above.
|
||||
```shell
|
||||
kubectl create secret -n kube-system generic openstack-ca-cert --from-literal=ca.pem="$(cat /etc/kubernetes/ca.pem)" --dry-run -o yaml > openstack-ca-cert.yaml
|
||||
kubectl apply -f openstack-ca-cert.yaml
|
||||
```
|
||||
Then create RBAC resources.
|
||||
```shell
|
||||
kubectl apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-openstack/release-1.15/manifests/cinder-csi-plugin/cinder-csi-controllerplugin-rbac.yaml
|
||||
kubectl apply -f https://github.com/kubernetes/cloud-provider-openstack/raw/release-1.15/manifests/cinder-csi-plugin/cinder-csi-nodeplugin-rbac.yaml
|
||||
```
|
||||
|
||||
The Cinder CSI plugin includes a controller plugin and a node plugin.
|
||||
The controller communicates with Kubernetes APIs and Cinder APIs to create/attach/detach/delete Cinder volumes. The node plugin in-turn runs on each worker node to bind a storage device (attached volume) to a pod, and unbind it during deletion.
|
||||
Create `cinder-csi-controllerplugin.yaml` and apply it to create csi controller.
|
||||
```yaml
|
||||
kind: Service
|
||||
apiVersion: v1
|
||||
metadata:
|
||||
name: csi-cinder-controller-service
|
||||
namespace: kube-system
|
||||
labels:
|
||||
app: csi-cinder-controllerplugin
|
||||
spec:
|
||||
selector:
|
||||
app: csi-cinder-controllerplugin
|
||||
ports:
|
||||
- name: dummy
|
||||
port: 12345
|
||||
|
||||
---
|
||||
kind: StatefulSet
|
||||
apiVersion: apps/v1
|
||||
metadata:
|
||||
name: csi-cinder-controllerplugin
|
||||
namespace: kube-system
|
||||
spec:
|
||||
serviceName: "csi-cinder-controller-service"
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: csi-cinder-controllerplugin
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: csi-cinder-controllerplugin
|
||||
spec:
|
||||
serviceAccount: csi-cinder-controller-sa
|
||||
containers:
|
||||
- name: csi-attacher
|
||||
image: quay.io/k8scsi/csi-attacher:v1.0.1
|
||||
args:
|
||||
- "--v=5"
|
||||
- "--csi-address=$(ADDRESS)"
|
||||
env:
|
||||
- name: ADDRESS
|
||||
value: /var/lib/csi/sockets/pluginproxy/csi.sock
|
||||
imagePullPolicy: "IfNotPresent"
|
||||
volumeMounts:
|
||||
- name: socket-dir
|
||||
mountPath: /var/lib/csi/sockets/pluginproxy/
|
||||
- name: csi-provisioner
|
||||
image: quay.io/k8scsi/csi-provisioner:v1.0.1
|
||||
args:
|
||||
- "--provisioner=csi-cinderplugin"
|
||||
- "--csi-address=$(ADDRESS)"
|
||||
env:
|
||||
- name: ADDRESS
|
||||
value: /var/lib/csi/sockets/pluginproxy/csi.sock
|
||||
imagePullPolicy: "IfNotPresent"
|
||||
volumeMounts:
|
||||
- name: socket-dir
|
||||
mountPath: /var/lib/csi/sockets/pluginproxy/
|
||||
- name: csi-snapshotter
|
||||
image: quay.io/k8scsi/csi-snapshotter:v1.0.1
|
||||
args:
|
||||
- "--connection-timeout=15s"
|
||||
- "--csi-address=$(ADDRESS)"
|
||||
env:
|
||||
- name: ADDRESS
|
||||
value: /var/lib/csi/sockets/pluginproxy/csi.sock
|
||||
imagePullPolicy: Always
|
||||
volumeMounts:
|
||||
- mountPath: /var/lib/csi/sockets/pluginproxy/
|
||||
name: socket-dir
|
||||
- name: cinder-csi-plugin
|
||||
image: docker.io/k8scloudprovider/cinder-csi-plugin:v1.15.0
|
||||
args :
|
||||
- /bin/cinder-csi-plugin
|
||||
- "--v=5"
|
||||
- "--nodeid=$(NODE_ID)"
|
||||
- "--endpoint=$(CSI_ENDPOINT)"
|
||||
- "--cloud-config=$(CLOUD_CONFIG)"
|
||||
- "--cluster=$(CLUSTER_NAME)"
|
||||
env:
|
||||
- name: NODE_ID
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: spec.nodeName
|
||||
- name: CSI_ENDPOINT
|
||||
value: unix://csi/csi.sock
|
||||
- name: CLOUD_CONFIG
|
||||
value: /etc/config/cloud.conf
|
||||
- name: CLUSTER_NAME
|
||||
value: kubernetes
|
||||
imagePullPolicy: "IfNotPresent"
|
||||
volumeMounts:
|
||||
- name: socket-dir
|
||||
mountPath: /csi
|
||||
- name: secret-cinderplugin
|
||||
mountPath: /etc/config
|
||||
readOnly: true
|
||||
- mountPath: /etc/kubernetes
|
||||
name: ca-cert
|
||||
readOnly: true
|
||||
volumes:
|
||||
- name: socket-dir
|
||||
hostPath:
|
||||
path: /var/lib/csi/sockets/pluginproxy/
|
||||
type: DirectoryOrCreate
|
||||
- name: secret-cinderplugin
|
||||
secret:
|
||||
secretName: cloud-config
|
||||
- name: ca-cert
|
||||
secret:
|
||||
secretName: openstack-ca-cert
|
||||
```
|
||||
|
||||
|
||||
Create `cinder-csi-nodeplugin.yaml` and apply it to create csi node.
|
||||
```yaml
|
||||
kind: DaemonSet
|
||||
apiVersion: apps/v1
|
||||
metadata:
|
||||
name: csi-cinder-nodeplugin
|
||||
namespace: kube-system
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
app: csi-cinder-nodeplugin
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: csi-cinder-nodeplugin
|
||||
spec:
|
||||
serviceAccount: csi-cinder-node-sa
|
||||
hostNetwork: true
|
||||
containers:
|
||||
- name: node-driver-registrar
|
||||
image: quay.io/k8scsi/csi-node-driver-registrar:v1.1.0
|
||||
args:
|
||||
- "--v=5"
|
||||
- "--csi-address=$(ADDRESS)"
|
||||
- "--kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)"
|
||||
lifecycle:
|
||||
preStop:
|
||||
exec:
|
||||
command: ["/bin/sh", "-c", "rm -rf /registration/cinder.csi.openstack.org /registration/cinder.csi.openstack.org-reg.sock"]
|
||||
env:
|
||||
- name: ADDRESS
|
||||
value: /csi/csi.sock
|
||||
- name: DRIVER_REG_SOCK_PATH
|
||||
value: /var/lib/kubelet/plugins/cinder.csi.openstack.org/csi.sock
|
||||
- name: KUBE_NODE_NAME
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: spec.nodeName
|
||||
imagePullPolicy: "IfNotPresent"
|
||||
volumeMounts:
|
||||
- name: socket-dir
|
||||
mountPath: /csi
|
||||
- name: registration-dir
|
||||
mountPath: /registration
|
||||
- name: cinder-csi-plugin
|
||||
securityContext:
|
||||
privileged: true
|
||||
capabilities:
|
||||
add: ["SYS_ADMIN"]
|
||||
allowPrivilegeEscalation: true
|
||||
image: docker.io/k8scloudprovider/cinder-csi-plugin:v1.15.0
|
||||
args :
|
||||
- /bin/cinder-csi-plugin
|
||||
- "--nodeid=$(NODE_ID)"
|
||||
- "--endpoint=$(CSI_ENDPOINT)"
|
||||
- "--cloud-config=$(CLOUD_CONFIG)"
|
||||
env:
|
||||
- name: NODE_ID
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: spec.nodeName
|
||||
- name: CSI_ENDPOINT
|
||||
value: unix://csi/csi.sock
|
||||
- name: CLOUD_CONFIG
|
||||
value: /etc/config/cloud.conf
|
||||
imagePullPolicy: "IfNotPresent"
|
||||
volumeMounts:
|
||||
- name: socket-dir
|
||||
mountPath: /csi
|
||||
- name: pods-mount-dir
|
||||
mountPath: /var/lib/kubelet/pods
|
||||
mountPropagation: "Bidirectional"
|
||||
- name: kubelet-dir
|
||||
mountPath: /var/lib/kubelet
|
||||
mountPropagation: "Bidirectional"
|
||||
- name: pods-cloud-data
|
||||
mountPath: /var/lib/cloud/data
|
||||
readOnly: true
|
||||
- name: pods-probe-dir
|
||||
mountPath: /dev
|
||||
mountPropagation: "HostToContainer"
|
||||
- name: secret-cinderplugin
|
||||
mountPath: /etc/config
|
||||
readOnly: true
|
||||
- mountPath: /etc/kubernetes
|
||||
name: ca-cert
|
||||
readOnly: true
|
||||
volumes:
|
||||
- name: socket-dir
|
||||
hostPath:
|
||||
path: /var/lib/kubelet/plugins/cinder.csi.openstack.org
|
||||
type: DirectoryOrCreate
|
||||
- name: registration-dir
|
||||
hostPath:
|
||||
path: /var/lib/kubelet/plugins_registry/
|
||||
type: Directory
|
||||
- name: kubelet-dir
|
||||
hostPath:
|
||||
path: /var/lib/kubelet
|
||||
type: Directory
|
||||
- name: pods-mount-dir
|
||||
hostPath:
|
||||
path: /var/lib/kubelet/pods
|
||||
type: Directory
|
||||
- name: pods-cloud-data
|
||||
hostPath:
|
||||
path: /var/lib/cloud/data
|
||||
type: Directory
|
||||
- name: pods-probe-dir
|
||||
hostPath:
|
||||
path: /dev
|
||||
type: Directory
|
||||
- name: secret-cinderplugin
|
||||
secret:
|
||||
secretName: cloud-config
|
||||
- name: ca-cert
|
||||
secret:
|
||||
secretName: openstack-ca-cert
|
||||
|
||||
```
|
||||
When they are both running, create a storage class for Cinder.
|
||||
|
||||
```yaml
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: csi-sc-cinderplugin
|
||||
provisioner: csi-cinderplugin
|
||||
```
|
||||
Then we can create a PVC with this class.
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: myvol
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 1Gi
|
||||
storageClassName: csi-sc-cinderplugin
|
||||
|
||||
```
|
||||
|
||||
When the PVC is created, a Cinder volume is created correspondingly.
|
||||
```console
|
||||
# kubectl get pvc
|
||||
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
|
||||
myvol Bound pvc-14b8bc68-6c4c-4dc6-ad79-4cb29a81faad 1Gi RWO csi-sc-cinderplugin 3s
|
||||
|
||||
```
|
||||
In OpenStack the volume name will match the Kubernetes persistent volume generated name. In this example it would be: _pvc-14b8bc68-6c4c-4dc6-ad79-4cb29a81faad_
|
||||
|
||||
Now we can create a pod with the PVC.
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: web
|
||||
spec:
|
||||
containers:
|
||||
- name: web
|
||||
image: nginx
|
||||
ports:
|
||||
- name: web
|
||||
containerPort: 80
|
||||
hostPort: 8081
|
||||
protocol: TCP
|
||||
volumeMounts:
|
||||
- mountPath: "/usr/share/nginx/html"
|
||||
name: mypd
|
||||
volumes:
|
||||
- name: mypd
|
||||
persistentVolumeClaim:
|
||||
claimName: myvol
|
||||
```
|
||||
When the pod is running, the volume will be attached to the pod.
|
||||
If we go back to OpenStack, we can see the Cinder volume is mounted to the worker node where the pod is running on.
|
||||
```console
|
||||
# openstack volume show 6b5f3296-b0eb-40cd-bd4f-2067a0d6287f
|
||||
+--------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| Field | Value |
|
||||
+--------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| attachments | [{u'server_id': u'1c5e1439-edfa-40ed-91fe-2a0e12bc7eb4', u'attachment_id': u'11a15b30-5c24-41d4-86d9-d92823983a32', u'attached_at': u'2019-07-24T05:02:34.000000', u'host_name': u'compute-6', u'volume_id': u'6b5f3296-b0eb-40cd-bd4f-2067a0d6287f', u'device': u'/dev/vdb', u'id': u'6b5f3296-b0eb-40cd-bd4f-2067a0d6287f'}] |
|
||||
| availability_zone | nova |
|
||||
| bootable | false |
|
||||
| consistencygroup_id | None |
|
||||
| created_at | 2019-07-24T05:02:18.000000 |
|
||||
| description | Created by OpenStack Cinder CSI driver |
|
||||
| encrypted | False |
|
||||
| id | 6b5f3296-b0eb-40cd-bd4f-2067a0d6287f |
|
||||
| migration_status | None |
|
||||
| multiattach | False |
|
||||
| name | pvc-14b8bc68-6c4c-4dc6-ad79-4cb29a81faad |
|
||||
| os-vol-host-attr:host | rbd:volumes@rbd#rbd |
|
||||
| os-vol-mig-status-attr:migstat | None |
|
||||
| os-vol-mig-status-attr:name_id | None |
|
||||
| os-vol-tenant-attr:tenant_id | 14ba698c0aec4fd6b7dc8c310f664009 |
|
||||
| properties | attached_mode='rw', cinder.csi.openstack.org/cluster='kubernetes' |
|
||||
| replication_status | None |
|
||||
| size | 1 |
|
||||
| snapshot_id | None |
|
||||
| source_volid | None |
|
||||
| status | in-use |
|
||||
| type | rbd |
|
||||
| updated_at | 2019-07-24T05:02:35.000000 |
|
||||
| user_id | 5f6a7a06f4e3456c890130d56babf591 |
|
||||
+--------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
|
||||
```
|
||||
|
||||
### Summary
|
||||
|
||||
In this walk-through, we deployed a Kubernetes cluster on OpenStack VMs and integrated it with OpenStack using an external OpenStack cloud provider. Then on this Kubernetes cluster we deployed Cinder CSI plugin which can create Cinder volumes and expose them in Kubernetes as persistent volumes.
|
|
@ -0,0 +1,45 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Contributor Summit Amsterdam Schedule Announced"
|
||||
date: 2020-02-18
|
||||
slug: Contributor-Summit-Amsterdam-Schedule-Announced
|
||||
---
|
||||
|
||||
**Authors:** Jeffrey Sica (Red Hat), Amanda Katona (VMware)
|
||||
|
||||
tl;dr [Registration is open](https://events.linuxfoundation.org/kubernetes-contributor-summit-europe/) and the [schedule is live](https://kcseu2020.sched.com/) so register now and we’ll see you in Amsterdam!
|
||||
|
||||
## Kubernetes Contributor Summit
|
||||
|
||||
**Sunday, March 29, 2020**
|
||||
|
||||
- Evening Contributor Celebration:
|
||||
[ZuidPool](https://www.zuid-pool.nl/en/)
|
||||
- Address: [Europaplein 22, 1078 GZ Amsterdam, Netherlands](https://www.google.com/search?q=KubeCon+Amsterdam+2020&ie=UTF-8&ibp=htl;events&rciv=evn&sa=X&ved=2ahUKEwiZoLvQ0dvnAhVST6wKHScBBZ8Q5bwDMAB6BAgSEAE#)
|
||||
- Time: 18:00 - 21:00
|
||||
|
||||
**Monday, March 30, 2020**
|
||||
|
||||
- All Day Contributor Summit:
|
||||
- [Amsterdam RAI](https://www.rai.nl/en/)
|
||||
- Address: [Europaplein 24, 1078 GZ Amsterdam, Netherlands](https://www.google.com/search?q=kubecon+amsterdam+2020&oq=kubecon+amste&aqs=chrome.0.35i39j69i57j0l4j69i61l2.3957j1j4&sourceid=chrome&ie=UTF-8&ibp=htl;events&rciv=evn&sa=X&ved=2ahUKEwiZoLvQ0dvnAhVST6wKHScBBZ8Q5bwDMAB6BAgSEAE#)
|
||||
- Time: 09:00 - 17:00 (Breakfast at 08:00)
|
||||
|
||||

|
||||
|
||||
Hello everyone and Happy 2020! It’s hard to believe that KubeCon EU 2020 is less than six weeks away, and with that another contributor summit! This year we have the pleasure of being in Amsterdam in early spring, so be sure to pack some warmer clothing. This summit looks to be exciting with a lot of fantastic community-driven content. We received **26** submissions from the CFP. From that, the events team selected **12** sessions. Each of the sessions falls into one of four categories:
|
||||
|
||||
* Community
|
||||
* Contributor Improvement
|
||||
* Sustainability
|
||||
* In-depth Technical
|
||||
|
||||
On top of the presentations, there will be a dedicated Docs Sprint as well as the New Contributor Workshop 101 and 201 Sessions. All told, we will have five separate rooms of content throughout the day on Monday. Please **[see the full schedule](https://kcseu2020.sched.com/)** to see what sessions you’d be interested in. We hope between the content provided and the inevitable hallway track, everyone has a fun and enriching experience.
|
||||
|
||||
Speaking of fun, the social Sunday night should be a blast! We’re hosting this summit’s social close to the conference center, at [ZuidPool](https://www.zuid-pool.nl/en/). There will be games, bingo, and unconference sign-up throughout the evening. It should be a relaxed way to kick off the week.
|
||||
|
||||
[Registration is open](https://events.linuxfoundation.org/kubernetes-contributor-summit-europe/)! Space is limited so it’s always a good idea to register early.
|
||||
|
||||
If you have any questions, reach out to the [Amsterdam Team](https://github.com/kubernetes/community/tree/master/events/2020/03-contributor-summit#team) on Slack in the [#contributor-summit](https://kubernetes.slack.com/archives/C7J893413) channel.
|
||||
|
||||
Hope to see you there!
|
|
@ -0,0 +1,82 @@
|
|||
---
|
||||
title: Bring your ideas to the world with kubectl plugins
|
||||
date: 2020-02-28
|
||||
---
|
||||
|
||||
**Author:** Cornelius Weig (TNG Technology Consulting GmbH)
|
||||
|
||||
`kubectl` is the most critical tool to interact with Kubernetes and has to address multiple user personas, each with their own needs and opinions.
|
||||
One way to make `kubectl` do what you need is to build new functionality into `kubectl`.
|
||||
|
||||
|
||||
## Challenges with building commands into `kubectl`
|
||||
|
||||
However, that's easier said than done. Being such an important cornerstone of
|
||||
Kubernetes, any meaningful change to `kubectl` needs to undergo a Kubernetes
|
||||
Enhancement Proposal (KEP) where the intended change is discussed beforehand.
|
||||
|
||||
When it comes to implementation, you'll find that `kubectl` is an ingenious and
|
||||
complex piece of engineering. It might take a long time to get used to
|
||||
the processes and style of the codebase to get done what you want to achieve. Next
|
||||
comes the review process which may go through several rounds until it meets all
|
||||
the requirements of the Kubernetes maintainers -- after all, they need to take
|
||||
over ownership of this feature and maintain it from the day it's merged.
|
||||
|
||||
When everything goes well, you can finally rejoice. Your code will be shipped
|
||||
with the next Kubernetes release. Well, that could mean you need to wait
|
||||
another 3 months to ship your idea in `kubectl` if you are unlucky.
|
||||
|
||||
So this was the happy path where everything goes well. But there are good
|
||||
reasons why your new functionality may never make it into `kubectl`. For one,
|
||||
`kubectl` has a particular look and feel and violating that style will not be
|
||||
acceptable by the maintainers. For example, an interactive command that
|
||||
produces output with colors would be inconsistent with the rest of `kubectl`.
|
||||
Also, when it comes to tools or commands useful only to a minuscule proportion
|
||||
of users, the maintainers may simply reject your proposal as `kubectl` needs to
|
||||
address common needs.
|
||||
|
||||
But this doesn’t mean you can’t ship your ideas to `kubectl` users.
|
||||
|
||||
## What if you didn’t have to change `kubectl` to add functionality?
|
||||
|
||||
This is where `kubectl` [plugins](https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/) shine.
|
||||
Since `kubectl` v1.12, you can simply
|
||||
drop executables into your `PATH`, which follows the naming pattern
|
||||
`kubectl-myplugin`. Then you can execute this plugin as `kubectl myplugin`, and
|
||||
it will just feel like a normal sub-command of `kubectl`.
|
||||
|
||||
Plugins give you the opportunity to try out new experiences like terminal UIs,
|
||||
colorful output, specialized functionality, or other innovative ideas. You can
|
||||
go creative, as you’re the owner of your own plugin.
|
||||
|
||||
Further, plugins offer safe experimentation space for commands you’d like to
|
||||
propose to `kubectl`. By pre-releasing as a plugin, you can push your
|
||||
functionality faster to the end-users and quickly gather feedback. For example,
|
||||
the [kubectl-debug](https://github.com/verb/kubectl-debug) plugin is proposed
|
||||
to become a built-in command in `kubectl` in a
|
||||
[KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-cli/20190805-kubectl-debug.md)).
|
||||
In the meanwhile, the plugin author can ship the functionality and collect
|
||||
feedback using the plugin mechanism.
|
||||
|
||||
## How to get started with developing plugins
|
||||
|
||||
If you already have an idea for a plugin, how do you best make it happen?
|
||||
First you have to ask yourself if you can implement it as a wrapper around
|
||||
existing `kubectl` functionality. If so, writing the plugin as a shell script
|
||||
is often the best way forward, because the resulting plugin will be small,
|
||||
works cross-platform, and has a high level of trust because it is not
|
||||
compiled.
|
||||
|
||||
On the other hand, if the plugin logic is complex, a general-purpose language
|
||||
is usually better. The canonical choice here is Go, because you can use the
|
||||
excellent `client-go` library to interact with the Kubernetes API. The Kubernetes
|
||||
maintained [sample-cli-plugin](https://github.com/kubernetes/sample-cli-plugin)
|
||||
demonstrates some best practices and can be used as a template for new plugin
|
||||
projects.
|
||||
|
||||
When the development is done, you just need to ship your plugin to the
|
||||
Kubernetes users. For the best plugin installation experience and discoverability,
|
||||
you should consider doing so via the
|
||||
[krew](https://github.com/kubernetes-sigs/krew) plugin manager. For an in-depth
|
||||
discussion about the technical details around `kubectl` plugins, refer to the
|
||||
documentation on [kubernetes.io](https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/).
|
|
@ -0,0 +1,15 @@
|
|||
---
|
||||
layout: blog
|
||||
title: Contributor Summit Amsterdam Postponed
|
||||
date: 2020-03-04
|
||||
slug: Contributor-Summit-Delayed
|
||||
---
|
||||
|
||||
**Authors:** Dawn Foster (VMware), Jorge Castro (VMware)
|
||||
|
||||
The CNCF has announced that [KubeCon + CloudNativeCon EU has been delayed](https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/attend/novel-coronavirus-update/) until July/August of 2020. As a result the Contributor Summit planning team is weighing options for how to proceed. Here’s the current plan:
|
||||
|
||||
- There will be an in-person Contributor Summit as planned when KubeCon + CloudNativeCon is rescheduled.
|
||||
- We are looking at options for having additional virtual contributor activities in the meantime.
|
||||
|
||||
We will communicate via this blog and the usual communications channels on the final plan. Please bear with us as we adapt when we get more information. Thank you for being patient as the team pivots to bring you a great Contributor Summit!
|
|
@ -0,0 +1,42 @@
|
|||
---
|
||||
layout: blog
|
||||
title: Join SIG Scalability and Learn Kubernetes the Hard Way
|
||||
date: 2020-03-19
|
||||
slug: join-sig-scalability
|
||||
---
|
||||
|
||||
**Authors:** Alex Handy
|
||||
|
||||
Contributing to SIG Scalability is a great way to learn Kubernetes in all its depth and breadth, and the team would love to have you [join as a contributor](https://github.com/kubernetes/community/tree/master/sig-scalability#scalability-special-interest-group). I took a look at the value of learning the hard way and interviewed the current SIG chairs to give you an idea of what contribution feels like.
|
||||
|
||||
## The value of Learning The Hard Way
|
||||
|
||||
There is a belief in the software development community that pushes for the most challenging and rigorous possible method of learning a new language or system. These tend to go by the moniker of "Learn \_\_ the Hard Way." Examples abound: Learn Code the Hard Way, Learn Python the Hard Way, and many others originating with Zed Shaw's courses in the topic.
|
||||
|
||||
While there are folks out there who offer you a "Learn Kubernetes the Hard Way" type experience (most notably [Kelsey Hightower's](https://github.com/kelseyhightower/kubernetes-the-hard-way)), any "Hard Way" project should attempt to cover every aspect of the core topic's principles.
|
||||
|
||||
Therefore, the real way to "Learn Kubernetes the Hard Way," is to join the CNCF and get involved in the project itself. And there is only one SIG that could genuinely offer a full-stack learning experience for Kubernetes: SIG Scalability.
|
||||
|
||||
The team behind SIG Scalability is responsible for detecting and dealing with issues that arise when Kubernetes clusters are working with upwards of a thousand nodes. Said [Wojiciech Tyczynski](https://github.com/wojtek-t), a staff software engineer at Google and a member of SIG Scalability, the standard size for a test cluster for this SIG is over 5,000 nodes.
|
||||
|
||||
And yet, this SIG is not composed of Ph.D.'s in highly scalable systems designs. Many of the folks working with Tyczynski, for example, joined the SIG knowing very little about these types of issues, and often, very little about Kubernetes.
|
||||
|
||||
Working on SIG Scalability is like jumping into the deep end of the pool to learn to swim, and the SIG is inherently concerned with the entire Kubernetes project. SIG Scalability focuses on how Kubernetes functions as a whole and at scale. The SIG Scalability team members have an impetus to learn about every system and to understand how all systems interact with one another.
|
||||
|
||||
## A complex and rewarding contributor experience
|
||||
|
||||
While that may sound complicated (and it is!), that doesn't mean it's outside the reach of an average developer, tester, or administrator. Google software developer Matt Matejczyk has only been on the team since the beginning of 2019, and he's been a valued member of the team since then, ferreting out bugs.
|
||||
|
||||
"I am new here," said Matejczyk. "I joined the team in January [2019]. Before that, I worked on AdWords at Google in New York. Why did I join? I knew some people there, so that was one of the decisions for me to move. I thought at that time that Kubernetes is a unique, cutting edge technology. I thought it'd be cool to work on that."
|
||||
|
||||
Matejczyk was correct about the coolness. "It's cool," he said. "So actually, ramping up on scalability is not easy. There are many things you need to understand. You need to understand Kubernetes very well. It can use every part of Kubernetes. I am still ramping up after these 8 months. I think it took me maybe 3 months to get up to decent speed."
|
||||
|
||||
When Matejczyk spoke to what he had worked on during those 8 months, he answered, "An interesting example is a regression I have been working on recently. We noticed the overall slowness of Kubernetes control plane in specific scenarios, and we couldn't attribute it to any particular component. In the end, we realized that everything boiled down to the memory allocation on the golang level. It was very counterintuitive to have two completely separate pieces of code (running as a part of the same binary) affecting the performance of each other only because one of them was allocating memory too fast. But connecting all the dots and getting to the bottom of regression like this gives great satisfaction."
|
||||
|
||||
Tyczynski said that "It's not only debugging regressions, but it's also debugging and finding bottlenecks. In general, those can be regressions, but those can be things we can improve. The other significant area is extending what we want to guarantee to users. Extending SLA and SLO coverage of the system so users can rely on what they can expect from the system in terms of performance and scalability. Matt is doing much work in extending our tests to be more representative and cover more Kubernetes concepts."
|
||||
|
||||
## Give SIG Scalability a try
|
||||
|
||||
The SIG Scalability team is always in need of new members, and if you're the sort of developer or tester who loves taking on new complex challenges, and perhaps loves learning things the hard way, consider joining this SIG. As the team points out, adding Kubernetes expertise to your resume is never a bad idea, and this is the one SIG where you can learn it all from top to bottom.
|
||||
|
||||
See [the SIG's documentation](https://github.com/kubernetes/community/tree/master/sig-scalability#scalability-special-interest-group) to learn about upcoming meetings, its charter, and more. You can also join the [#sig-scalability Slack channel](https://kubernetes.slack.com/archives/C09QZTRH7) to see what it's like. We hope to see you join in to take advantage of this great opportunity to learn Kubernetes and contribute back at the same time.
|
|
@ -0,0 +1,135 @@
|
|||
---
|
||||
layout: blog
|
||||
title: 'Kubernetes 1.18: Fit & Finish'
|
||||
date: 2020-03-25
|
||||
slug: kubernetes-1-18-release-announcement
|
||||
---
|
||||
|
||||
**Authors:** [Kubernetes 1.18 Release Team](https://github.com/kubernetes/sig-release/blob/master/releases/release-1.18/release_team.md)
|
||||
|
||||
We're pleased to announce the delivery of Kubernetes 1.18, our first release of 2020! Kubernetes 1.18 consists of 38 enhancements: 15 enhancements are moving to stable, 11 enhancements in beta, and 12 enhancements in alpha.
|
||||
|
||||
Kubernetes 1.18 is a "fit and finish" release. Significant work has gone into improving beta and stable features to ensure users have a better experience. An equal effort has gone into adding new developments and exciting new features that promise to enhance the user experience even more.
|
||||
Having almost as many enhancements in alpha, beta, and stable is a great achievement. It shows the tremendous effort made by the community on improving the reliability of Kubernetes as well as continuing to expand its existing functionality.
|
||||
|
||||
|
||||
## Major Themes
|
||||
|
||||
### Kubernetes Topology Manager Moves to Beta - Align Up!
|
||||
|
||||
A beta feature of Kubernetes in release 1.18, the [Topology Manager feature](https://github.com/nolancon/website/blob/f4200307260ea3234540ef13ed80de325e1a7267/content/en/docs/tasks/administer-cluster/topology-manager.md) enables NUMA alignment of CPU and devices (such as SR-IOV VFs) that will allow your workload to run in an environment optimized for low-latency. Prior to the introduction of the Topology Manager, the CPU and Device Manager would make resource allocation decisions independent of each other. This could result in undesirable allocations on multi-socket systems, causing degraded performance on latency critical applications.
|
||||
|
||||
### Serverside Apply Introduces Beta 2
|
||||
|
||||
Server-side Apply was promoted to Beta in 1.16, but is now introducing a second Beta in 1.18. This new version will track and manage changes to fields of all new Kubernetes objects, allowing you to know what changed your resources and when.
|
||||
|
||||
|
||||
### Extending Ingress with and replacing a deprecated annotation with IngressClass
|
||||
|
||||
In Kubernetes 1.18, there are two significant additions to Ingress: A new `pathType` field and a new `IngressClass` resource. The `pathType` field allows specifying how paths should be matched. In addition to the default `ImplementationSpecific` type, there are new `Exact` and `Prefix` path types.
|
||||
|
||||
The `IngressClass` resource is used to describe a type of Ingress within a Kubernetes cluster. Ingresses can specify the class they are associated with by using a new `ingressClassName` field on Ingresses. This new resource and field replace the deprecated `kubernetes.io/ingress.class` annotation.
|
||||
|
||||
### SIG-CLI introduces kubectl alpha debug
|
||||
|
||||
SIG-CLI was debating the need for a debug utility for quite some time already. With the development of [ephemeral containers](https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/), it became more obvious how we can support developers with tooling built on top of `kubectl exec`. The addition of the [`kubectl alpha debug` command](https://github.com/kubernetes/enhancements/blob/master/keps/sig-cli/20190805-kubectl-debug.md) (it is alpha but your feedback is more than welcome), allows developers to easily debug their Pods inside the cluster. We think this addition is invaluable. This command allows one to create a temporary container which runs next to the Pod one is trying to examine, but also attaches to the console for interactive troubleshooting.
|
||||
|
||||
|
||||
### Introducing Windows CSI support alpha for Kubernetes
|
||||
|
||||
The alpha version of CSI Proxy for Windows is being released with Kubernetes 1.18. CSI proxy enables CSI Drivers on Windows by allowing containers in Windows to perform privileged storage operations.
|
||||
|
||||
## Other Updates
|
||||
|
||||
### Graduated to Stable 💯
|
||||
|
||||
- [Taint Based Eviction](https://github.com/kubernetes/enhancements/issues/166)
|
||||
- [`kubectl diff`](https://github.com/kubernetes/enhancements/issues/491)
|
||||
- [CSI Block storage support](https://github.com/kubernetes/enhancements/issues/565)
|
||||
- [API Server dry run](https://github.com/kubernetes/enhancements/issues/576)
|
||||
- [Pass Pod information in CSI calls](https://github.com/kubernetes/enhancements/issues/603)
|
||||
- [Support Out-of-Tree vSphere Cloud Provider](https://github.com/kubernetes/enhancements/issues/670)
|
||||
- [Support GMSA for Windows workloads](https://github.com/kubernetes/enhancements/issues/689)
|
||||
- [Skip attach for non-attachable CSI volumes](https://github.com/kubernetes/enhancements/issues/770)
|
||||
- [PVC cloning](https://github.com/kubernetes/enhancements/issues/989)
|
||||
- [Moving kubectl package code to staging](https://github.com/kubernetes/enhancements/issues/1020)
|
||||
- [RunAsUserName for Windows](https://github.com/kubernetes/enhancements/issues/1043)
|
||||
- [AppProtocol for Services and Endpoints](https://github.com/kubernetes/enhancements/issues/1507)
|
||||
- [Extending Hugepage Feature](https://github.com/kubernetes/enhancements/issues/1539)
|
||||
- [client-go signature refactor to standardize options and context handling](https://github.com/kubernetes/enhancements/issues/1601)
|
||||
- [Node-local DNS cache](https://github.com/kubernetes/enhancements/issues/1024)
|
||||
|
||||
|
||||
### Major Changes
|
||||
|
||||
- [EndpointSlice API](https://github.com/kubernetes/enhancements/issues/752)
|
||||
- [Moving kubectl package code to staging](https://github.com/kubernetes/enhancements/issues/1020)
|
||||
- [CertificateSigningRequest API](https://github.com/kubernetes/enhancements/issues/1513)
|
||||
- [Extending Hugepage Feature](https://github.com/kubernetes/enhancements/issues/1539)
|
||||
- [client-go signature refactor to standardize options and context handling](https://github.com/kubernetes/enhancements/issues/1601)
|
||||
|
||||
|
||||
### Release Notes
|
||||
|
||||
Check out the full details of the Kubernetes 1.18 release in our [release notes](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md).
|
||||
|
||||
|
||||
### Availability
|
||||
|
||||
Kubernetes 1.18 is available for download on [GitHub](https://github.com/kubernetes/kubernetes/releases/tag/v1.18.0). To get started with Kubernetes, check out these [interactive tutorials](https://kubernetes.io/docs/tutorials/) or run local Kubernetes clusters using Docker container “nodes” with [kind](https://kind.sigs.k8s.io/). You can also easily install 1.18 using [kubeadm](https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/).
|
||||
|
||||
### Release Team
|
||||
|
||||
This release is made possible through the efforts of hundreds of individuals who contributed both technical and non-technical content. Special thanks to the [release team](https://github.com/kubernetes/sig-release/blob/master/releases/release-1.18/release_team.md) led by Jorge Alarcon Ochoa, Site Reliability Engineer at Searchable AI. The 34 release team members coordinated many aspects of the release, from documentation to testing, validation, and feature completeness.
|
||||
|
||||
As the Kubernetes community has grown, our release process represents an amazing demonstration of collaboration in open source software development. Kubernetes continues to gain new users at a rapid pace. This growth creates a positive feedback cycle where more contributors commit code creating a more vibrant ecosystem. Kubernetes has had over [40,000 individual contributors](https://k8s.devstats.cncf.io/d/24/overall-project-statistics?orgId=1) to date and an active community of more than 3,000 people.
|
||||
|
||||
### Release Logo
|
||||
|
||||

|
||||
|
||||
#### Why the LHC?
|
||||
|
||||
The LHC is the world’s largest and most powerful particle accelerator. It is the result of the collaboration of thousands of scientists from around the world, all for the advancement of science. In a similar manner, Kubernetes has been a project that has united thousands of contributors from hundreds of organizations – all to work towards the same goal of improving cloud computing in all aspects! "A Bit Quarky" as the release name is meant to remind us that unconventional ideas can bring about great change and keeping an open mind to diversity will lead help us innovate.
|
||||
|
||||
|
||||
#### About the designer
|
||||
|
||||
Maru Lango is a designer currently based in Mexico City. While her area of expertise is Product Design, she also enjoys branding, illustration and visual experiments using CSS + JS and contributing to diversity efforts within the tech and design communities. You may find her in most social media as @marulango or check her website: https://marulango.com
|
||||
|
||||
### User Highlights
|
||||
|
||||
- Ericsson is using Kubernetes and other cloud native technology to deliver a [highly demanding 5G network](https://www.cncf.io/case-study/ericsson/) that resulted in up to 90 percent CI/CD savings.
|
||||
- Zendesk is using Kubernetes to [run around 70% of its existing applications](https://www.cncf.io/case-study/zendesk/). It’s also building all new applications to also run on Kubernetes, which has brought time savings, greater flexibility, and increased velocity to its application development.
|
||||
- LifeMiles has [reduced infrastructure spending by 50%](https://www.cncf.io/case-study/lifemiles/) because of its move to Kubernetes. It has also allowed them to double its available resource capacity.
|
||||
|
||||
### Ecosystem Updates
|
||||
|
||||
- The CNCF published the results of its [annual survey](https://www.cncf.io/blog/2020/03/04/2019-cncf-survey-results-are-here-deployments-are-growing-in-size-and-speed-as-cloud-native-adoption-becomes-mainstream/) showing that Kubernetes usage in production is skyrocketing. The survey found that 78% of respondents are using Kubernetes in production compared to 58% last year.
|
||||
- The “Introduction to Kubernetes” course hosted by the CNCF [surpassed 100,000 registrations](https://www.cncf.io/announcement/2020/01/28/cloud-native-computing-foundation-announces-introduction-to-kubernetes-course-surpasses-100000-registrations/).
|
||||
|
||||
### Project Velocity
|
||||
|
||||
The CNCF has continued refining DevStats, an ambitious project to visualize the myriad contributions that go into the project. [K8s DevStats](https://k8s.devstats.cncf.io/d/12/dashboards?orgId=1) illustrates the breakdown of contributions from major company contributors, as well as an impressive set of preconfigured reports on everything from individual contributors to pull request lifecycle times.
|
||||
|
||||
This past quarter, 641 different companies and over 6,409 individuals contributed to Kubernetes. [Check out DevStats](https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&var-period=m&var-repogroup_name=All) to learn more about the overall velocity of the Kubernetes project and community.
|
||||
|
||||
### Event Update
|
||||
|
||||
Kubecon + CloudNativeCon EU 2020 is being pushed back – for the more most up-to-date information, please check the [Novel Coronavirus Update page](https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/attend/novel-coronavirus-update/).
|
||||
|
||||
### Upcoming Release Webinar
|
||||
|
||||
Join members of the Kubernetes 1.18 release team on April 23rd, 2020 to learn about the major features in this release including kubectl debug, Topography Manager, Ingress to V1 graduation, and client-go. Register here: https://www.cncf.io/webinars/kubernetes-1-18/.
|
||||
|
||||
### Get Involved
|
||||
|
||||
The simplest way to get involved with Kubernetes is by joining one of the many [Special Interest Groups](https://github.com/kubernetes/community/blob/master/sig-list.md) (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly [community meeting](https://github.com/kubernetes/community/tree/master/communication), and through the channels below. Thank you for your continued feedback and support.
|
||||
|
||||
- Follow us on Twitter [@Kubernetesio](https://twitter.com/kubernetesio) for latest updates
|
||||
- Join the community discussion on [Discuss](https://discuss.kubernetes.io/)
|
||||
- Join the community on [Slack](http://slack.k8s.io/)
|
||||
- Post questions (or answer questions) on [Stack Overflow](http://stackoverflow.com/questions/tagged/kubernetes)
|
||||
- Share your Kubernetes [story](https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform)
|
||||
- Read more about what’s happening with Kubernetes on the [blog](https://kubernetes.io/blog/)
|
||||
- Learn more about the [Kubernetes Release Team](https://github.com/kubernetes/sig-release/tree/master/release-team)
|
|
@ -0,0 +1,503 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Kubernetes Topology Manager Moves to Beta - Align Up!"
|
||||
date: 2020-04-01
|
||||
slug: kubernetes-1-18-feature-topoloy-manager-beta
|
||||
---
|
||||
|
||||
**Authors:** Kevin Klues (NVIDIA), Victor Pickard (Red Hat), Conor Nolan (Intel)
|
||||
|
||||
This blog post describes the **<code>TopologyManager</code>**, a beta feature of Kubernetes in release 1.18. The **<code>TopologyManager</code>** feature enables NUMA alignment of CPUs and peripheral devices (such as SR-IOV VFs and GPUs), allowing your workload to run in an environment optimized for low-latency.
|
||||
|
||||
Prior to the introduction of the **<code>TopologyManager</code>**, the CPU and Device Manager would make resource allocation decisions independent of each other. This could result in undesirable allocations on multi-socket systems, causing degraded performance on latency critical applications. With the introduction of the **<code>TopologyManager</code>**, we now have a way to avoid this.
|
||||
|
||||
This blog post covers:
|
||||
|
||||
1. A brief introduction to NUMA and why it is important
|
||||
1. The policies available to end-users to ensure NUMA alignment of CPUs and devices
|
||||
1. The internal details of how the **<code>TopologyManager</code>** works
|
||||
1. Current limitations of the **<code>TopologyManager</code>**
|
||||
1. Future directions of the **<code>TopologyManager</code>**
|
||||
|
||||
## So, what is NUMA and why do I care?
|
||||
|
||||
The term NUMA stands for Non-Uniform Memory Access. It is a technology available on multi-cpu systems that allows different CPUs to access different parts of memory at different speeds. Any memory directly connected to a CPU is considered "local" to that CPU and can be accessed very fast. Any memory not directly connected to a CPU is considered "non-local" and will have variable access times depending on how many interconnects must be passed through in order to reach it. On modern systems, the idea of having "local" vs. "non-local" memory can also be extended to peripheral devices such as NICs or GPUs. For high performance, CPUs and devices should be allocated such that they have access to the same local memory.
|
||||
|
||||
All memory on a NUMA system is divided into a set of "NUMA nodes", with each node representing the local memory for a set of CPUs or devices. We talk about an individual CPU as being part of a NUMA node if its local memory is associated with that NUMA node.
|
||||
|
||||
We talk about a peripheral device as being part of a NUMA node based on the shortest number of interconnects that must be passed through in order to reach it.
|
||||
|
||||
For example, in Figure 1, CPUs 0-3 are said to be part of NUMA node 0, whereas CPUs 4-7 are part of NUMA node 1. Likewise GPU 0 and NIC 0 are said to be part of NUMA node 0 because they are attached to Socket 0, whose CPUs are all part of NUMA node 0. The same is true for GPU 1 and NIC 1 on NUMA node 1.
|
||||
|
||||
<p align="center">
|
||||
<img height="300" src="/images/blog/2020-03-25-kubernetes-1.18-release-announcement/example-numa-system.png">
|
||||
</p>
|
||||
|
||||
|
||||
**Figure 1:** An example system with 2 NUMA nodes, 2 Sockets with 4 CPUs each, 2 GPUs, and 2 NICs. CPUs on Socket 0, GPU 0, and NIC 0 are all part of NUMA node 0. CPUs on Socket 1, GPU 1, and NIC 1 are all part of NUMA node 1.
|
||||
|
||||
|
||||
Although the example above shows a 1-1 mapping of NUMA Node to Socket, this is not necessarily true in the general case. There may be multiple sockets on a single NUMA node, or individual CPUs of a single socket may be connected to different NUMA nodes. Moreover, emerging technologies such as Sub-NUMA Clustering ([available on recent intel CPUs](https://software.intel.com/en-us/articles/intel-xeon-processor-scalable-family-technical-overview)) allow single CPUs to be associated with multiple NUMA nodes so long as their memory access times to both nodes are the same (or have a negligible difference).
|
||||
|
||||
The **<code>TopologyManager</code>** has been built to handle all of these scenarios.
|
||||
|
||||
## Align Up! It's a TeaM Effort!
|
||||
|
||||
As previously stated, the **<code>TopologyManager</code>** allows users to align their CPU and peripheral device allocations by NUMA node. There are several policies available for this:
|
||||
|
||||
* **<code>none:</code>** this policy will not attempt to do any alignment of resources. It will act the same as if the **<code>TopologyManager</code>** were not present at all. This is the default policy.
|
||||
* **<code>best-effort:</code>** with this policy, the **<code>TopologyManager</code>** will attempt to align allocations on NUMA nodes as best it can, but will always allow the pod to start even if some of the allocated resources are not aligned on the same NUMA node.
|
||||
* **<code>restricted:</code>** this policy is the same as the **<code>best-effort</code>** policy, except it will fail pod admission if allocated resources cannot be aligned properly. Unlike with the **<code>single-numa-node</code>** policy, some allocations may come from multiple NUMA nodes if it is impossible to _ever_ satisfy the allocation request on a single NUMA node (e.g. 2 devices are requested and the only 2 devices on the system are on different NUMA nodes).
|
||||
* **<code>single-numa-node:</code>** this policy is the most restrictive and will only allow a pod to be admitted if _all_ requested CPUs and devices can be allocated from exactly one NUMA node.
|
||||
|
||||
It is important to note that the selected policy is applied to each container in a pod spec individually, rather than aligning resources across all containers together.
|
||||
|
||||
Moreover, a single policy is applied to _all_ pods on a node via a global **<code>kubelet</code>** flag, rather than allowing users to select different policies on a pod-by-pod basis (or a container-by-container basis). We hope to relax this restriction in the future.
|
||||
|
||||
The **<code>kubelet</code>** flag to set one of these policies can be seen below:
|
||||
|
||||
|
||||
```
|
||||
--topology-manager-policy=
|
||||
[none | best-effort | restricted | single-numa-node]
|
||||
```
|
||||
|
||||
|
||||
Additionally, the **<code>TopologyManager</code>** is protected by a feature gate. This feature gate has been available since Kubernetes 1.16, but has only been enabled by default since 1.18.
|
||||
|
||||
The feature gate can be enabled or disabled as follows (as described in more detail [here](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/)):
|
||||
|
||||
|
||||
```
|
||||
--feature-gates="...,TopologyManager=<true|false>"
|
||||
```
|
||||
|
||||
|
||||
In order to trigger alignment according to the selected policy, a user must request CPUs and peripheral devices in their pod spec, according to a certain set of requirements.
|
||||
|
||||
For peripheral devices, this means requesting devices from the available resources provided by a device plugin (e.g. **<code>intel.com/sriov</code>**, **<code>nvidia.com/gpu</code>**, etc.). This will only work if the device plugin has been extended to integrate properly with the **<code>TopologyManager</code>**. Currently, the only plugins known to have this extension are the [Nvidia GPU device plugin](https://github.com/NVIDIA/k8s-device-plugin/blob/5cb45d52afdf5798a40f8d0de049bce77f689865/nvidia.go#L74), and the [Intel SRIOV network device plugin](https://github.com/intel/sriov-network-device-plugin/blob/30e33f1ce2fc7b45721b6de8c8207e65dbf2d508/pkg/resources/pciNetDevice.go#L80). Details on how to extend a device plugin to integrate with the **<code>TopologyManager</code>** can be found [here](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#device-plugin-integration-with-the-topology-manager).
|
||||
|
||||
For CPUs, this requires that the **<code>CPUManager</code>** has been configured with its **<code>--static</code>** policy enabled and that the pod is running in the Guaranteed QoS class (i.e. all CPU and memory **<code>limits</code>** are equal to their respective CPU and memory **<code>requests</code>**). CPUs must also be requested in whole number values (e.g. **<code>1</code>**, **<code>2</code>**, **<code>1000m</code>**, etc). Details on how to set the **<code>CPUManager</code>** policy can be found [here](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#cpu-management-policies).
|
||||
|
||||
For example, assuming the **<code>CPUManager</code>** is running with its **<code>--static</code>** policy enabled and the device plugins for **<code>gpu-vendor.com</code>**, and **<code>nic-vendor.com</code>** have been extended to integrate with the **<code>TopologyManager</code>** properly, the pod spec below is sufficient to trigger the **<code>TopologyManager</code>** to run its selected policy:
|
||||
|
||||
```
|
||||
spec:
|
||||
containers:
|
||||
- name: numa-aligned-container
|
||||
image: alpine
|
||||
resources:
|
||||
limits:
|
||||
cpu: 2
|
||||
memory: 200Mi
|
||||
gpu-vendor.com/gpu: 1
|
||||
nic-vendor.com/nic: 1
|
||||
```
|
||||
|
||||
Following Figure 1 from the previous section, this would result in one of the following aligned allocations:
|
||||
|
||||
```
|
||||
{cpu: {0, 1}, gpu: 0, nic: 0}
|
||||
{cpu: {0, 2}, gpu: 0, nic: 0}
|
||||
{cpu: {0, 3}, gpu: 0, nic: 0}
|
||||
{cpu: {1, 2}, gpu: 0, nic: 0}
|
||||
{cpu: {1, 3}, gpu: 0, nic: 0}
|
||||
{cpu: {2, 3}, gpu: 0, nic: 0}
|
||||
|
||||
{cpu: {4, 5}, gpu: 1, nic: 1}
|
||||
{cpu: {4, 6}, gpu: 1, nic: 1}
|
||||
{cpu: {4, 7}, gpu: 1, nic: 1}
|
||||
{cpu: {5, 6}, gpu: 1, nic: 1}
|
||||
{cpu: {5, 7}, gpu: 1, nic: 1}
|
||||
{cpu: {6, 7}, gpu: 1, nic: 1}
|
||||
```
|
||||
|
||||
And that’s it! Just follow this pattern to have the **<code>TopologyManager</code>** ensure NUMA alignment across containers that request topology-aware devices and exclusive CPUs.
|
||||
|
||||
**NOTE:** if a pod is rejected by one of the **<code>TopologyManager</code>** policies, it will be placed in a **<code>Terminated</code>** state with a pod admission error and a reason of "**<code>TopologyAffinityError</code>**". Once a pod is in this state, the Kubernetes scheduler will not attempt to reschedule it. It is therefore recommended to use a [**<code>Deployment</code>**](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#creating-a-deployment) with replicas to trigger a redeploy of the pod on such a failure. An [external control loop](https://kubernetes.io/docs/concepts/architecture/controller/) can also be implemented to trigger a redeployment of pods that have a **<code>TopologyAffinityError</code>**.
|
||||
|
||||
## This is great, so how does it work under the hood?
|
||||
|
||||
Pseudocode for the primary logic carried out by the **<code>TopologyManager</code>** can be seen below:
|
||||
|
||||
```
|
||||
for container := range append(InitContainers, Containers...) {
|
||||
for provider := range HintProviders {
|
||||
hints += provider.GetTopologyHints(container)
|
||||
}
|
||||
|
||||
bestHint := policy.Merge(hints)
|
||||
|
||||
for provider := range HintProviders {
|
||||
provider.Allocate(container, bestHint)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The following diagram summarizes the steps taken during this loop:
|
||||
|
||||
<p align="center">
|
||||
<img weight="200" height="200" src="/images/blog/2020-03-25-kubernetes-1.18-release-announcement/numa-steps-during-loop.png">
|
||||
</p>
|
||||
|
||||
The steps themselves are:
|
||||
|
||||
1. Loop over all containers in a pod.
|
||||
1. For each container, gather "**<code>TopologyHints</code>**" from a set of "**<code>HintProviders</code>**" for each topology-aware resource type requested by the container (e.g. **<code>gpu-vendor.com/gpu</code>**, **<code>nic-vendor.com/nic</code>**, **<code>cpu</code>**, etc.).
|
||||
1. Using the selected policy, merge the gathered **<code>TopologyHints</code>** to find the "best" hint that aligns resource allocations across all resource types.
|
||||
1. Loop back over the set of hint providers, instructing them to allocate the resources they control using the merged hint as a guide.
|
||||
1. This loop runs at pod admission time and will fail to admit the pod if any of these steps fail or alignment cannot be satisfied according to the selected policy. Any resources allocated before the failure are cleaned up accordingly.
|
||||
|
||||
The following sections go into more detail on the exact structure of **<code>TopologyHints</code>** and **<code>HintProviders</code>**, as well as some details on the merge strategies used by each policy.
|
||||
|
||||
### TopologyHints
|
||||
|
||||
A **<code>TopologyHint</code>** encodes a set of constraints from which a given resource request can be satisfied. At present, the only constraint we consider is NUMA alignment. It is defined as follows:
|
||||
|
||||
```
|
||||
type TopologyHint struct {
|
||||
NUMANodeAffinity bitmask.BitMask
|
||||
Preferred bool
|
||||
}
|
||||
```
|
||||
|
||||
The **<code>NUMANodeAffinity</code>** field contains a bitmask of NUMA nodes where a resource request can be satisfied. For example, the possible masks on a system with 2 NUMA nodes include:
|
||||
|
||||
```
|
||||
{00}, {01}, {10}, {11}
|
||||
```
|
||||
|
||||
The **<code>Preferred</code>** field contains a boolean that encodes whether the given hint is "preferred" or not. With the **<code>best-effort</code>** policy, preferred hints will be given preference over non-preferred hints when generating a "best" hint. With the **<code>restricted</code>** and **<code>single-numa-node</code>** policies, non-preferred hints will be rejected.
|
||||
|
||||
In general, **<code>HintProviders</code>** generate **<code>TopologyHints</code>** by looking at the set of currently available resources that can satisfy a resource request. More specifically, they generate one **<code>TopologyHint</code>** for every possible mask of NUMA nodes where that resource request can be satisfied. If a mask cannot satisfy the request, it is omitted. For example, a **<code>HintProvider</code>** might provide the following hints on a system with 2 NUMA nodes when being asked to allocate 2 resources. These hints encode that both resources could either come from a single NUMA node (either 0 or 1), or they could each come from different NUMA nodes (but we prefer for them to come from just one).
|
||||
|
||||
```
|
||||
{01: True}, {10: True}, {11: False}
|
||||
```
|
||||
|
||||
At present, all **<code>HintProviders</code>** set the **<code>Preferred</code>** field to **<code>True</code>** if and only if the **<code>NUMANodeAffinity</code>** encodes a _minimal_ set of NUMA nodes that can satisfy the resource request. Normally, this will only be **<code>True</code>** for **<code>TopologyHints</code>** with a single NUMA node set in their bitmask. However, it may also be **<code>True</code>** if the only way to _ever_ satisfy the resource request is to span multiple NUMA nodes (e.g. 2 devices are requested and the only 2 devices on the system are on different NUMA nodes):
|
||||
|
||||
```
|
||||
{0011: True}, {0111: False}, {1011: False}, {1111: False}
|
||||
```
|
||||
|
||||
**NOTE:** Setting of the **<code>Preferred</code>** field in this way is _not_ based on the set of currently available resources. It is based on the ability to physically allocate the number of requested resources on some minimal set of NUMA nodes.
|
||||
|
||||
In this way, it is possible for a **<code>HintProvider</code>** to return a list of hints with _all_ **<code>Preferred</code>** fields set to **<code>False</code>** if an actual preferred allocation cannot be satisfied until other containers release their resources. For example, consider the following scenario from the system in Figure 1:
|
||||
|
||||
1. All but 2 CPUs are currently allocated to containers
|
||||
1. The 2 remaining CPUs are on different NUMA nodes
|
||||
1. A new container comes along asking for 2 CPUs
|
||||
|
||||
In this case, the only generated hint would be **<code>{11: False}</code>** and not **<code>{11: True}</code>**. This happens because it _is_ possible to allocate 2 CPUs from the same NUMA node on this system (just not right now, given the current allocation state). The idea being that it is better to fail pod admission and retry the deployment when the minimal alignment can be satisfied than to allow a pod to be scheduled with sub-optimal alignment.
|
||||
|
||||
### HintProviders
|
||||
|
||||
A **<code>HintProvider</code>** is a component internal to the **<code>kubelet</code>** that coordinates aligned resource allocations with the **<code>TopologyManager</code>**. At present, the only **<code>HintProviders</code>** in Kubernetes are the **<code>CPUManager</code>** and the **<code>DeviceManager</code>**. We plan to add support for **<code>HugePages</code>** soon.
|
||||
|
||||
As discussed previously, the **<code>TopologyManager</code>** both gathers **<code>TopologyHints</code>** from **<code>HintProviders</code>** as well as triggers aligned resource allocations on them using a merged "best" hint. As such, **<code>HintProviders</code>** implement the following interface:
|
||||
|
||||
```
|
||||
type HintProvider interface {
|
||||
GetTopologyHints(*v1.Pod, *v1.Container) map[string][]TopologyHint
|
||||
Allocate(*v1.Pod, *v1.Container) error
|
||||
}
|
||||
```
|
||||
|
||||
Notice that the call to **<code>GetTopologyHints()</code>** returns a **<code>map[string][]TopologyHint</code>**. This allows a single **<code>HintProvider</code>** to provide hints for multiple resource types instead of just one. For example, the **<code>DeviceManager</code>** requires this in order to pass hints back for every resource type registered by its plugins.
|
||||
|
||||
As **<code>HintProviders</code>** generate their hints, they only consider how alignment could be satisfied for _currently_ available resources on the system. Any resources already allocated to other containers are not considered.
|
||||
|
||||
For example, consider the system in Figure 1, with the following two containers requesting resources from it:
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<td align="center"><strong><code>Container0</code></strong>
|
||||
</td>
|
||||
<td align="center"><strong><code>Container1</code></strong>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
|
||||
<pre>
|
||||
spec:
|
||||
containers:
|
||||
- name: numa-aligned-container0
|
||||
image: alpine
|
||||
resources:
|
||||
limits:
|
||||
cpu: 2
|
||||
memory: 200Mi
|
||||
gpu-vendor.com/gpu: 1
|
||||
nic-vendor.com/nic: 1
|
||||
</pre>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
<pre>
|
||||
spec:
|
||||
containers:
|
||||
- name: numa-aligned-container1
|
||||
image: alpine
|
||||
resources:
|
||||
limits:
|
||||
cpu: 2
|
||||
memory: 200Mi
|
||||
gpu-vendor.com/gpu: 1
|
||||
nic-vendor.com/nic: 1
|
||||
</pre>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
If **<code>Container0</code>** is the first container considered for allocation on the system, the following set of hints will be generated for the three topology-aware resource types in the spec.
|
||||
|
||||
```
|
||||
cpu: {{01: True}, {10: True}, {11: False}}
|
||||
gpu-vendor.com/gpu: {{01: True}, {10: True}}
|
||||
nic-vendor.com/nic: {{01: True}, {10: True}}
|
||||
```
|
||||
|
||||
With a resulting aligned allocation of:
|
||||
|
||||
```
|
||||
{cpu: {0, 1}, gpu: 0, nic: 0}
|
||||
```
|
||||
|
||||
<p align="center">
|
||||
<img height="300" src="/images/blog/2020-03-25-kubernetes-1.18-release-announcement/numa-hint-provider1.png">
|
||||
</p>
|
||||
|
||||
|
||||
When considering **<code>Container1</code>** these resources are then presumed to be unavailable, and thus only the following set of hints will be generated:
|
||||
|
||||
```
|
||||
cpu: {{01: True}, {10: True}, {11: False}}
|
||||
gpu-vendor.com/gpu: {{10: True}}
|
||||
nic-vendor.com/nic: {{10: True}}
|
||||
```
|
||||
|
||||
With a resulting aligned allocation of:
|
||||
|
||||
|
||||
```
|
||||
{cpu: {4, 5}, gpu: 1, nic: 1}
|
||||
```
|
||||
|
||||
<p align="center">
|
||||
<img height="300" src="/images/blog/2020-03-25-kubernetes-1.18-release-announcement/numa-hint-provider2.png">
|
||||
</p>
|
||||
|
||||
|
||||
**NOTE:** Unlike the pseudocode provided at the beginning of this section, the call to **<code>Allocate()</code>** does not actually take a parameter for the merged "best" hint directly. Instead, the **<code>TopologyManager</code>** implements the following **<code>Store</code>** interface that **<code>HintProviders</code>** can query to retrieve the hint generated for a particular container once it has been generated:
|
||||
|
||||
```
|
||||
type Store interface {
|
||||
GetAffinity(podUID string, containerName string) TopologyHint
|
||||
}
|
||||
```
|
||||
|
||||
Separating this out into its own API call allows one to access this hint outside of the pod admission loop. This is useful for debugging as well as for reporting generated hints in tools such as **<code>kubectl</code>**(not yet available).
|
||||
|
||||
### Policy.Merge
|
||||
|
||||
The merge strategy defined by a given policy dictates how it combines the set of **<code>TopologyHints</code>** generated by all **<code>HintProviders</code>** into a single **<code>TopologyHint</code>** that can be used to inform aligned resource allocations.
|
||||
|
||||
The general merge strategy for all supported policies begins the same:
|
||||
|
||||
1. Take the cross-product of **<code>TopologyHints</code>** generated for each resource type
|
||||
1. For each entry in the cross-product, **<code>bitwise-and</code>** the NUMA affinities of each **<code>TopologyHint</code>** together. Set this as the NUMA affinity in a resulting "merged" hint.
|
||||
1. If all of the hints in an entry have **<code>Preferred</code>** set to **<code>True</code>** , set **<code>Preferred</code>** to **<code>True</code>** in the resulting "merged" hint.
|
||||
1. If even one of the hints in an entry has **<code>Preferred</code>** set to **<code>False</code>** , set **<code>Preferred</code>** to **<code>False</code>** in the resulting "merged" hint. Also set **<code>Preferred</code>** to **<code>False</code>** in the "merged" hint if its NUMA affinity contains all 0s.
|
||||
|
||||
Following the example from the previous section with hints for **<code>Container0</code>** generated as:
|
||||
|
||||
```
|
||||
cpu: {{01: True}, {10: True}, {11: False}}
|
||||
gpu-vendor.com/gpu: {{01: True}, {10: True}}
|
||||
nic-vendor.com/nic: {{01: True}, {10: True}}
|
||||
```
|
||||
|
||||
The above algorithm results in the following set of cross-product entries and "merged" hints:
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<td align="center">cross-product entry
|
||||
<p>
|
||||
<strong><code>{cpu, gpu-vendor.com/gpu, nic-vendor.com/nic}</code></strong>
|
||||
</p>
|
||||
</td>
|
||||
<td align="center">"merged" hint
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center">
|
||||
<strong><code>{{01: True}, {01: True}, {01: True}}</code></strong>
|
||||
</td>
|
||||
<td align="center"><strong><code>{01: True}</code></strong>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center">
|
||||
<strong><code>{{01: True}, {01: True}, {10: True}}</code></strong>
|
||||
</td>
|
||||
<td align="center"><strong><code>{00: False}</code></strong>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center">
|
||||
<strong><code>{{01: True}, {10: True}, {01: True}}</code></strong>
|
||||
</td>
|
||||
<td align="center"><strong><code>{00: False}</code></strong>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center">
|
||||
<strong><code>{{01: True}, {10: True}, {10: True}}</code></strong>
|
||||
</td>
|
||||
<td align="center"><strong><code>{00: False}</code></strong>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
</td>
|
||||
<td>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center">
|
||||
<strong><code>{{10: True}, {01: True}, {01: True}}</code></strong>
|
||||
</td>
|
||||
<td align="center"><strong><code>{00: False}</code></strong>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center">
|
||||
<strong><code>{{10: True}, {01: True}, {10: True}}</code></strong>
|
||||
</td align="center">
|
||||
<td align="center"><strong><code>{00: False}</code></strong>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center">
|
||||
<strong><code>{{10: True}, {10: True}, {01: True}}</code></strong>
|
||||
</td>
|
||||
<td align="center"><strong><code>{00: False}</code></strong>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center">
|
||||
<strong><code>{{10: True}, {10: True}, {10: True}}</code></strong>
|
||||
</td>
|
||||
<td align="center"><strong><code>{01: True}</code></strong>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
</td>
|
||||
<td>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center">
|
||||
<strong><code>{{11: False}, {01: True}, {01: True}}</code></strong>
|
||||
</td>
|
||||
<td align="center"><strong><code>{01: False}</code></strong>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center">
|
||||
<strong><code>{{11: False}, {01: True}, {10: True}}</code></strong>
|
||||
</td>
|
||||
<td align="center"><strong><code>{00: False}</code></strong>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center">
|
||||
<strong><code>{{11: False}, {10: True}, {01: True}}</code></strong>
|
||||
</td>
|
||||
<td align="center"><strong><code>{00: False}</code></strong>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center">
|
||||
<strong><code>{{11: False}, {10: True}, {10: True}}</code></strong>
|
||||
</td>
|
||||
<td align="center"><strong><code>{10: False}</code></strong>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
|
||||
Once this list of "merged" hints has been generated, it is the job of the specific **<code>TopologyManager</code>** policy in use to decide which one to consider as the "best" hint.
|
||||
|
||||
In general, this involves:
|
||||
|
||||
1. Sorting merged hints by their "narrowness". Narrowness is defined as the number of bits set in a hint’s NUMA affinity mask. The fewer bits set, the narrower the hint. For hints that have the same number of bits set in their NUMA affinity mask, the hint with the most low order bits set is considered narrower.
|
||||
1. Sorting merged hints by their **<code>Preferred</code>** field. Hints that have **<code>Preferred</code>** set to **<code>True</code>** are considered more likely candidates than hints with **<code>Preferred</code>** set to **<code>False</code>**.
|
||||
1. Selecting the narrowest hint with the best possible setting for **<code>Preferred</code>**.
|
||||
|
||||
In the case of the **<code>best-effort</code>** policy this algorithm will always result in _some_ hint being selected as the "best" hint and the pod being admitted. This "best" hint is then made available to **<code>HintProviders</code>** so they can make their resource allocations based on it.
|
||||
|
||||
However, in the case of the **<code>restricted</code>** and **<code>single-numa-node</code>** policies, any selected hint with **<code>Preferred</code>** set to **<code>False</code>** will be rejected immediately, causing pod admission to fail and no resources to be allocated. Moreover, the **<code>single-numa-node</code>** will also reject a selected hint that has more than one NUMA node set in its affinity mask.
|
||||
|
||||
In the example above, the pod would be admitted by all policies with a hint of **<code>{01: True}</code>**.
|
||||
|
||||
## Upcoming enhancements
|
||||
|
||||
While the 1.18 release and promotion to Beta brings along some great enhancements and fixes, there are still a number of limitations, described [here](https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/#known-limitations). We are already underway working to address these limitations and more.
|
||||
|
||||
This section walks through the set of enhancements we plan to implement for the **<code>TopologyManager</code>** in the near future. This list is not exhaustive, but it gives a good idea of the direction we are moving in. It is ordered by the timeframe in which we expect to see each enhancement completed.
|
||||
|
||||
If you would like to get involved in helping with any of these enhancements, please [join the weekly Kubernetes SIG-node meetings](https://github.com/kubernetes/community/tree/master/sig-node) to learn more and become part of the community effort!
|
||||
|
||||
### Supporting device-specific constraints
|
||||
|
||||
Currently, NUMA affinity is the only constraint considered by the **<code>TopologyManager</code>** for resource alignment. Moreover, the only scalable extensions that can be made to a **<code>TopologyHint</code>** involve _node-level_ constraints, such as PCIe bus alignment across device types. It would be intractable to try and add any _device-specific_ constraints to this struct (e.g. the internal NVLINK topology among a set of GPU devices).
|
||||
|
||||
As such, we propose an extension to the device plugin interface that will allow a plugin to state its topology-aware allocation preferences, without having to expose any device-specific topology information to the kubelet. In this way, the **<code>TopologyManager</code>** can be restricted to only deal with common node-level topology constraints, while still having a way of incorporating device-specific topology constraints into its allocation decisions.
|
||||
|
||||
Details of this proposal can be found [here](https://github.com/kubernetes/enhancements/pull/1121), and should be available as soon as Kubernetes 1.19.
|
||||
|
||||
### NUMA alignment for hugepages
|
||||
|
||||
As stated previously, the only two **<code>HintProviders</code>** currently available to the **<code>TopologyManager</code>** are the **<code>CPUManager</code>** and the **<code>DeviceManager</code>**. However, work is currently underway to add support for hugepages as well. With the completion of this work, the **<code>TopologyManager</code>** will finally be able to allocate memory, hugepages, CPUs and PCI devices all on the same NUMA node.
|
||||
|
||||
A [KEP](https://github.com/kubernetes/enhancements/blob/253f1e5bdd121872d2d0f7020a5ac0365b229e30/keps/sig-node/20200203-memory-manager.md) for this work is currently under review, and a prototype is underway to get this feature implemented very soon.
|
||||
|
||||
|
||||
### Scheduler awareness
|
||||
|
||||
Currently, the **<code>TopologyManager</code>** acts as a Pod Admission controller. It is not directly involved in the scheduling decision of where a pod will be placed. Rather, when the kubernetes scheduler (or whatever scheduler is running in the deployment), places a pod on a node to run, the **<code>TopologyManager</code>** will decide if the pod should be "admitted" or "rejected". If the pod is rejected due to lack of available NUMA aligned resources, things can get a little interesting. This kubernetes [issue](https://github.com/kubernetes/kubernetes/issues/84869) highlights and discusses this situation well.
|
||||
|
||||
So how do we go about addressing this limitation? We have the [Kubernetes Scheduling Framework](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/20180409-scheduling-framework.md) to the rescue! This framework provides a new set of plugin APIs that integrate with the existing Kubernetes Scheduler and allow scheduling features, such as NUMA alignment, to be implemented without having to resort to other, perhaps less appealing alternatives, including writing your own scheduler, or even worse, creating a fork to add your own scheduler secret sauce.
|
||||
|
||||
The details of how to implement these extensions for integration with the **<code>TopologyManager</code>** have not yet been worked out. We still need to answer questions like:
|
||||
|
||||
* Will we require duplicated logic to determine device affinity in the **<code>TopologyManager</code>** and the scheduler?
|
||||
* Do we need a new API to get **<code>TopologyHints</code>** from the **<code>TopologyManager</code>** to the scheduler plugin?
|
||||
|
||||
Work on this feature should begin in the next couple of months, so stay tuned!
|
||||
|
||||
|
||||
### Per-pod alignment policy
|
||||
|
||||
As stated previously, a single policy is applied to _all_ pods on a node via a global **<code>kubelet</code>** flag, rather than allowing users to select different policies on a pod-by-pod basis (or a container-by-container basis).
|
||||
|
||||
While we agree that this would be a great feature to have, there are quite a few hurdles that need to be overcome before it is achievable. The biggest hurdle being that this enhancement will require an API change to be able to express the desired alignment policy in either the Pod spec or its associated **<code>[RuntimeClass](https://kubernetes.io/docs/concepts/containers/runtime-class/)</code>**.
|
||||
|
||||
We are only now starting to have serious discussions around this feature, and it is still a few releases away, at the best, from being available.
|
||||
|
||||
## Conclusion
|
||||
|
||||
With the promotion of the **<code>TopologyManager</code>** to Beta in 1.18, we encourage everyone to give it a try and look forward to any feedback you may have. Many fixes and enhancements have been worked on in the past several releases, greatly improving the functionality and reliability of the **<code>TopologyManager</code>** and its **<code>HintProviders</code>**. While there are still a number of limitations, we have a set of enhancements planned to address them, and look forward to providing you with a number of new features in upcoming releases.
|
||||
|
||||
If you have ideas for additional enhancements or a desire for certain features, don’t hesitate to let us know. The team is always open to suggestions to enhance and improve the **<code>TopologyManager</code>**.
|
||||
|
||||
We hope you have found this blog informative and useful! Let us know if you have any questions or comments. And, happy deploying…..Align Up!
|
||||
|
||||
|
||||
<!-- Docs to Markdown version 1.0β20 -->
|
|
@ -0,0 +1,51 @@
|
|||
---
|
||||
layout: blog
|
||||
title: Kubernetes 1.18 Feature Server-side Apply Beta 2
|
||||
date: 2020-04-01
|
||||
slug: Kubernetes-1.18-Feature-Server-side-Apply-Beta-2
|
||||
---
|
||||
|
||||
**Authors:** Antoine Pelisse (Google)
|
||||
|
||||
## What is Server-side Apply?
|
||||
Server-side Apply is an important effort to migrate “kubectl apply” to the apiserver. It was started in 2018 by the Apply working group.
|
||||
|
||||
The use of kubectl to declaratively apply resources has exposed the following challenges:
|
||||
|
||||
- One needs to use the kubectl go code, or they have to shell out to kubectl.
|
||||
|
||||
- Strategic merge-patch, the patch format used by kubectl, grew organically and was challenging to fix while maintaining compatibility with various api-server versions.
|
||||
|
||||
- Some features are hard to implement directly on the client, for example, unions.
|
||||
|
||||
|
||||
Server-side Apply is a new merging algorithm, as well as tracking of field ownership, running on the Kubernetes api-server. Server-side Apply enables new features like conflict detection, so the system knows when two actors are trying to edit the same field.
|
||||
|
||||
## How does it work, what’s managedFields?
|
||||
Server-side Apply works by keeping track of which actor of the system has changed each field of an object. It does so by diffing all updates to objects, and recording all the fields that have changed as well the time of the operation. All this information is stored in the managedFields in the metadata of objects. Since objects can have many fields, this field can be quite large.
|
||||
|
||||
When someone applies, we can then use the information stored within managedFields to report relevant conflicts and help the merge algorithm to do the right thing.
|
||||
|
||||
## Wasn’t it already Beta before 1.18?
|
||||
Yes, Server-side Apply has been Beta since 1.16, but it didn’t track the owner for fields associated with objects that had not been applied. This means that most objects didn’t have the managedFields metadata stored, and conflicts for these objects cannot be resolved. With Kubernetes 1.18, all new objects will have the managedFields attached to them and provide accurate information on conflicts.
|
||||
|
||||
## How do I use it?
|
||||
The most common way to use this is through kubectl: `kubectl apply --server-side`. This is likely to show conflicts with other actors, including client-side apply. When that happens, conflicts can be forced by using the `--force-conflicts` flag, which will grab the ownership for the fields that have changed.
|
||||
|
||||
## Current limitations
|
||||
We have two important limitations right now, especially with sub-resources. The first is that if you apply with a status, the status is going to be ignored. We are still going to try and acquire the fields, which may lead to invalid conflicts. The other is that we do not update the managedFields on some sub-resources, including scale, so you may not see information about a horizontal pod autoscaler changing the number of replicas.
|
||||
|
||||
## What’s next?
|
||||
We are working hard to improve the experience of using server-side apply with kubectl, and we are trying to make it the default. As part of that, we want to improve the migration from client-side to server-side.
|
||||
|
||||
## Can I help?
|
||||
Of course! The working-group apply is available on slack #wg-apply, through the [mailing list](https://groups.google.com/forum/#!forum/kubernetes-wg-apply) and we also meet every other Tuesday at 9.30 PT on Zoom. We have lots of exciting features to build and can use all sorts of help.
|
||||
|
||||
We would also like to use the opportunity to thank the hard work of all the contributors involved in making this new beta possible:
|
||||
|
||||
* Daniel Smith
|
||||
* Jenny Buckley
|
||||
* Joe Betz
|
||||
* Julian Modesto
|
||||
* Kevin Wiesmüller
|
||||
* Maria Ntalla
|
|
@ -0,0 +1,199 @@
|
|||
---
|
||||
layout: blog
|
||||
title: 'Kong Ingress Controller and Service Mesh: Setting up Ingress to Istio on Kubernetes'
|
||||
date: 2020-03-18
|
||||
slug: kong-ingress-controller-and-istio-service-mesh
|
||||
---
|
||||
|
||||
**Author:** Kevin Chen, Kong
|
||||
|
||||
Kubernetes has become the de facto way to orchestrate containers and the services within services. But how do we give services outside our cluster access to what is within? Kubernetes comes with the Ingress API object that manages external access to services within a cluster.
|
||||
|
||||
Ingress is a group of rules that will proxy inbound connections to endpoints defined by a backend. However, Kubernetes does not know what to do with Ingress resources without an Ingress controller, which is where an open source controller can come into play. In this post, we are going to use one option for this: the Kong Ingress Controller. The Kong Ingress Controller was open-sourced a year ago and recently reached one million downloads. In the recent 0.7 release, service mesh support was also added. Other features of this release include:
|
||||
|
||||
* **Built-In Kubernetes Admission Controller**, which validates Custom Resource Definitions (CRD) as they are created or updated and rejects any invalid configurations.
|
||||
* **In-memory Mode** - Each pod’s controller actively configures the Kong container in its pod, which limits the blast radius of failure of a single container of Kong or controller container to that pod only.
|
||||
* **Native gRPC Routing** - gRPC traffic can now be routed via Kong Ingress Controller natively with support for method-based routing.
|
||||
|
||||

|
||||
|
||||
If you would like a deeper dive into Kong Ingress Controller 0.7, please check out the [GitHub repository](https://github.com/Kong/kubernetes-ingress-controller).
|
||||
|
||||
But let’s get back to the service mesh support since that will be the main focal point of this blog post. Service mesh allows organizations to address microservices challenges related to security, reliability, and observability by abstracting inter-service communication into a mesh layer. But what if our mesh layer sits within Kubernetes and we still need to expose certain services beyond our cluster? Then you need an Ingress controller such as the Kong Ingress Controller. In this blog post, we’ll cover how to deploy Kong Ingress Controller as your Ingress layer to an Istio mesh. Let’s dive right in:
|
||||
|
||||

|
||||
|
||||
### Part 0: Set up Istio on Kubernetes
|
||||
|
||||
This blog will assume you have Istio set up on Kubernetes. If you need to catch up to this point, please check out the [Istio documentation](https://istio.io/docs/setup/). It will walk you through setting up Istio on Kubernetes.
|
||||
|
||||
### 1. Install the Bookinfo Application
|
||||
|
||||
First, we need to label the namespaces that will host our application and Kong proxy. To label our default namespace where the bookinfo app sits, run this command:
|
||||
|
||||
```
|
||||
$ kubectl label namespace default istio-injection=enabled
|
||||
namespace/default labeled
|
||||
```
|
||||
|
||||
Then create a new namespace that will be hosting our Kong gateway and the Ingress controller:
|
||||
|
||||
```
|
||||
$ kubectl create namespace kong
|
||||
namespace/kong created
|
||||
```
|
||||
|
||||
Because Kong will be sitting outside the default namespace, be sure you also label the Kong namespace with istio-injection enabled as well:
|
||||
|
||||
```
|
||||
$ kubectl label namespace kong istio-injection=enabled
|
||||
namespace/kong labeled
|
||||
```
|
||||
|
||||
Having both namespaces labeled `istio-injection=enabled` is necessary. Or else the default configuration will not inject a sidecar container into the pods of your namespaces.
|
||||
|
||||
Now deploy your BookInfo application with the following command:
|
||||
|
||||
```
|
||||
$ kubectl apply -f http://bit.ly/bookinfoapp
|
||||
service/details created
|
||||
serviceaccount/bookinfo-details created
|
||||
deployment.apps/details-v1 created
|
||||
service/ratings created
|
||||
serviceaccount/bookinfo-ratings created
|
||||
deployment.apps/ratings-v1 created
|
||||
service/reviews created
|
||||
serviceaccount/bookinfo-reviews created
|
||||
deployment.apps/reviews-v1 created
|
||||
deployment.apps/reviews-v2 created
|
||||
deployment.apps/reviews-v3 created
|
||||
service/productpage created
|
||||
serviceaccount/bookinfo-productpage created
|
||||
deployment.apps/productpage-v1 created
|
||||
```
|
||||
|
||||
Let’s double-check our Services and Pods to make sure that we have it all set up correctly:
|
||||
|
||||
```
|
||||
$ kubectl get services
|
||||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
||||
details ClusterIP 10.97.125.254 <none> 9080/TCP 29s
|
||||
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 29h
|
||||
productpage ClusterIP 10.97.62.68 <none> 9080/TCP 28s
|
||||
ratings ClusterIP 10.96.15.180 <none> 9080/TCP 28s
|
||||
reviews ClusterIP 10.104.207.136 <none> 9080/TCP 28s
|
||||
```
|
||||
|
||||
You should see four new services: details, productpage, ratings, and reviews. None of them have an external IP so we will use the [Kong gateway](https://github.com/Kong/kong) to expose the necessary services. And to check pods, run the following command:
|
||||
|
||||
```
|
||||
$ kubectl get pods
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
details-v1-c5b5f496d-9wm29 2/2 Running 0 101s
|
||||
productpage-v1-7d6cfb7dfd-5mc96 2/2 Running 0 100s
|
||||
ratings-v1-f745cf57b-hmkwf 2/2 Running 0 101s
|
||||
reviews-v1-85c474d9b8-kqcpt 2/2 Running 0 101s
|
||||
reviews-v2-ccffdd984-9jnsj 2/2 Running 0 101s
|
||||
reviews-v3-98dc67b68-nzw97 2/2 Running 0 101s
|
||||
```
|
||||
|
||||
This command outputs useful data, so let’s take a second to understand it. If you examine the READY column, each pod has two containers running: the service and an Envoy sidecar injected alongside it. Another thing to highlight is that there are three review pods but only 1 review service. The Envoy sidecar will load balance the traffic to three different review pods that contain different versions, giving us the ability to A/B test our changes. With that said, you should now be able to access your product page!
|
||||
|
||||
```
|
||||
$ kubectl exec -it $(kubectl get pod -l app=ratings -o jsonpath='{.items[0].metadata.name}') -c ratings -- curl productpage:9080/productpage | grep -o "<title>.*</title>"
|
||||
<title>Simple Bookstore App</title>
|
||||
```
|
||||
|
||||
### 2. Kong Kubernetes Ingress Controller Without Database
|
||||
|
||||
To expose your services to the world, we will deploy Kong as the north-south traffic gateway. [Kong 1.1](https://github.com/Kong/kong/releases/tag/1.1.2) released with declarative configuration and DB-less mode. Declarative configuration allows you to specify the desired system state through a YAML or JSON file instead of a sequence of API calls. Using declarative config provides several key benefits to reduce complexity, increase automation and enhance system performance. And with the Kong Ingress Controller, any Ingress rules you apply to the cluster will automatically be configured on the Kong proxy. Let’s set up the Kong Ingress Controller and the actual Kong proxy first like this:
|
||||
|
||||
```
|
||||
$ kubectl apply -f https://bit.ly/k4k8s
|
||||
namespace/kong configured
|
||||
customresourcedefinition.apiextensions.k8s.io/kongconsumers.configuration.konghq.com created
|
||||
customresourcedefinition.apiextensions.k8s.io/kongcredentials.configuration.konghq.com created
|
||||
customresourcedefinition.apiextensions.k8s.io/kongingresses.configuration.konghq.com created
|
||||
customresourcedefinition.apiextensions.k8s.io/kongplugins.configuration.konghq.com created
|
||||
serviceaccount/kong-serviceaccount created
|
||||
clusterrole.rbac.authorization.k8s.io/kong-ingress-clusterrole created
|
||||
clusterrolebinding.rbac.authorization.k8s.io/kong-ingress-clusterrole-nisa-binding created
|
||||
configmap/kong-server-blocks created
|
||||
service/kong-proxy created
|
||||
service/kong-validation-webhook created
|
||||
deployment.apps/ingress-kong created
|
||||
```
|
||||
|
||||
To check if the Kong pod is up and running, run:
|
||||
|
||||
```
|
||||
$ kubectl get pods -n kong
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
pod/ingress-kong-8b44c9856-9s42v 3/3 Running 0 2m26s
|
||||
```
|
||||
|
||||
There will be three containers within this pod. The first container is the Kong Gateway that will be the Ingress point to your cluster. The second container is the Ingress controller. It uses Ingress resources and updates the proxy to follow rules defined in the resource. And lastly, the third container is the Envoy proxy injected by Istio. Kong will route traffic through the Envoy sidecar proxy to the appropriate service. To send requests into the cluster via our newly deployed Kong Gateway, setup an environment variable with the a URL based on the IP address at which Kong is accessible.
|
||||
|
||||
```
|
||||
$ export PROXY_URL="$(minikube service -n kong kong-proxy --url | head -1)"
|
||||
$ echo $PROXY_URL
|
||||
http://192.168.99.100:32728
|
||||
```
|
||||
|
||||
Next, we need to change some configuration so that the side-car Envoy process can route the request correctly based on the host/authority header of the request. Run the following to stop the route from preserving host:
|
||||
|
||||
```
|
||||
$ echo "
|
||||
apiVersion: configuration.konghq.com/v1
|
||||
kind: KongIngress
|
||||
metadata:
|
||||
name: do-not-preserve-host
|
||||
route:
|
||||
preserve_host: false
|
||||
" | kubectl apply -f -
|
||||
kongingress.configuration.konghq.com/do-not-preserve-host created
|
||||
```
|
||||
|
||||
And annotate the existing productpage service to set service-upstream as true:
|
||||
|
||||
```
|
||||
$ kubectl annotate svc productpage Ingress.kubernetes.io/service-upstream="true"
|
||||
service/productpage annotated
|
||||
```
|
||||
|
||||
Now that we have everything set up, we can look at how to use the Ingress resource to help route external traffic to the services within your Istio mesh. We’ll create an Ingress rule that routes all traffic with the path of `/` to our productpage service:
|
||||
|
||||
```
|
||||
$ echo "
|
||||
apiVersion: extensions/v1beta1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: productpage
|
||||
annotations:
|
||||
configuration.konghq.com: do-not-preserve-host
|
||||
spec:
|
||||
rules:
|
||||
- http:
|
||||
paths:
|
||||
- path: /
|
||||
backend:
|
||||
serviceName: productpage
|
||||
servicePort: 9080
|
||||
" | kubectl apply -f -
|
||||
ingress.extensions/productpage created
|
||||
```
|
||||
|
||||
And just like that, the Kong Ingress Controller is able to understand the rules you defined in the Ingress resource and routes it to the productpage service! To view the product page service’s GUI, go to `$PROXY_URL/productpage` in your browser. Or to test it in your command line, try:
|
||||
|
||||
```
|
||||
$ curl $PROXY_URL/productpage
|
||||
```
|
||||
|
||||
That is all I have for this walk-through. If you enjoyed the technologies used in this post, please check out their repositories since they are all open source and would love to have more contributors! Here are their links for your convenience:
|
||||
|
||||
* Kong: [[GitHub](https://github.com/Kong/kubernetes-ingress-controller)] [[Twitter](https://twitter.com/thekonginc)]
|
||||
* Kubernetes: [[GitHub](https://github.com/kubernetes/kubernetes)] [[Twitter](https://twitter.com/kubernetesio)]
|
||||
* Istio: [[GitHub](https://github.com/istio/istio)] [[Twitter](https://twitter.com/IstioMesh)]
|
||||
* Envoy: [[GitHub](https://github.com/envoyproxy/envoy)] [[Twitter](https://twitter.com/EnvoyProxy)]
|
||||
|
||||
Thank you for following along!
|
|
@ -6,8 +6,8 @@ cid: community
|
|||
|
||||
<div class="newcommunitywrapper">
|
||||
<div class="banner1">
|
||||
<img src="/images/community/kubernetes-community-final-02.jpg" alt="Kubernetes Conference Gallery" style="width:100%" class="desktop">
|
||||
<img src="/images/community/kubernetes-community-02-mobile.jpg" alt="Kubernetes Conference Gallery" style="width:100%" class="mobile">
|
||||
<img src="/images/community/kubernetes-community-final-02.jpg" alt="Kubernetes Conference Gallery" style="width:100%;padding-left:0px" class="desktop">
|
||||
<img src="/images/community/kubernetes-community-02-mobile.jpg" alt="Kubernetes Conference Gallery" style="width:100%;padding-left:0px" class="mobile">
|
||||
</div>
|
||||
|
||||
<div class="intro">
|
||||
|
|
|
@ -24,9 +24,9 @@ Once you've set your desired state, the *Kubernetes Control Plane* makes the clu
|
|||
* **[kubelet](/docs/admin/kubelet/)**, which communicates with the Kubernetes Master.
|
||||
* **[kube-proxy](/docs/admin/kube-proxy/)**, a network proxy which reflects Kubernetes networking services on each node.
|
||||
|
||||
## Kubernetes Objects
|
||||
## Kubernetes objects
|
||||
|
||||
Kubernetes contains a number of abstractions that represent the state of your system: deployed containerized applications and workloads, their associated network and disk resources, and other information about what your cluster is doing. These abstractions are represented by objects in the Kubernetes API. See [Understanding Kubernetes Objects](/docs/concepts/overview/working-with-objects/kubernetes-objects/) for more details.
|
||||
Kubernetes contains a number of abstractions that represent the state of your system: deployed containerized applications and workloads, their associated network and disk resources, and other information about what your cluster is doing. These abstractions are represented by objects in the Kubernetes API. See [Understanding Kubernetes objects](/docs/concepts/overview/working-with-objects/kubernetes-objects/#kubernetes-objects) for more details.
|
||||
|
||||
The basic Kubernetes objects include:
|
||||
|
||||
|
@ -35,7 +35,7 @@ The basic Kubernetes objects include:
|
|||
* [Volume](/docs/concepts/storage/volumes/)
|
||||
* [Namespace](/docs/concepts/overview/working-with-objects/namespaces/)
|
||||
|
||||
Kubernetes also contains higher-level abstractions that rely on [Controllers](/docs/concepts/architecture/controller/) to build upon the basic objects, and provide additional functionality and convenience features. These include:
|
||||
Kubernetes also contains higher-level abstractions that rely on [controllers](/docs/concepts/architecture/controller/) to build upon the basic objects, and provide additional functionality and convenience features. These include:
|
||||
|
||||
* [Deployment](/docs/concepts/workloads/controllers/deployment/)
|
||||
* [DaemonSet](/docs/concepts/workloads/controllers/daemonset/)
|
||||
|
|
|
@ -26,7 +26,7 @@ closer to the desired state, by turning equipment on or off.
|
|||
## Controller pattern
|
||||
|
||||
A controller tracks at least one Kubernetes resource type.
|
||||
These [objects](/docs/concepts/overview/working-with-objects/kubernetes-objects/)
|
||||
These [objects](/docs/concepts/overview/working-with-objects/kubernetes-objects/#kubernetes-objects)
|
||||
have a spec field that represents the desired state. The
|
||||
controller(s) for that resource are responsible for making the current
|
||||
state come closer to that desired state.
|
||||
|
@ -113,17 +113,15 @@ useful changes, it doesn't matter if the overall state is or is not stable.
|
|||
As a tenet of its design, Kubernetes uses lots of controllers that each manage
|
||||
a particular aspect of cluster state. Most commonly, a particular control loop
|
||||
(controller) uses one kind of resource as its desired state, and has a different
|
||||
kind of resource that it manages to make that desired state happen.
|
||||
kind of resource that it manages to make that desired state happen. For example,
|
||||
a controller for Jobs tracks Job objects (to discover new work) and Pod objects
|
||||
(to run the Jobs, and then to see when the work is finished). In this case
|
||||
something else creates the Jobs, whereas the Job controller creates Pods.
|
||||
|
||||
It's useful to have simple controllers rather than one, monolithic set of control
|
||||
loops that are interlinked. Controllers can fail, so Kubernetes is designed to
|
||||
allow for that.
|
||||
|
||||
For example: a controller for Jobs tracks Job objects (to discover
|
||||
new work) and Pod object (to run the Jobs, and then to see when the work is
|
||||
finished). In this case something else creates the Jobs, whereas the Job
|
||||
controller creates Pods.
|
||||
|
||||
{{< note >}}
|
||||
There can be several controllers that create or update the same kind of object.
|
||||
Behind the scenes, Kubernetes controllers make sure that they only pay attention
|
||||
|
|
|
@ -30,7 +30,7 @@ A node's status contains the following information:
|
|||
* [Capacity and Allocatable](#capacity)
|
||||
* [Info](#info)
|
||||
|
||||
Node status and other details about a node can be displayed using below command:
|
||||
Node status and other details about a node can be displayed using the following command:
|
||||
```shell
|
||||
kubectl describe node <insert-node-name-here>
|
||||
```
|
||||
|
@ -72,7 +72,7 @@ The node condition is represented as a JSON object. For example, the following r
|
|||
]
|
||||
```
|
||||
|
||||
If the Status of the Ready condition remains `Unknown` or `False` for longer than the `pod-eviction-timeout`, an argument is passed to the [kube-controller-manager](/docs/admin/kube-controller-manager/) and all the Pods on the node are scheduled for deletion by the Node Controller. The default eviction timeout duration is **five minutes**. In some cases when the node is unreachable, the apiserver is unable to communicate with the kubelet on the node. The decision to delete the pods cannot be communicated to the kubelet until communication with the apiserver is re-established. In the meantime, the pods that are scheduled for deletion may continue to run on the partitioned node.
|
||||
If the Status of the Ready condition remains `Unknown` or `False` for longer than the `pod-eviction-timeout` (an argument passed to the [kube-controller-manager](/docs/admin/kube-controller-manager/)), all the Pods on the node are scheduled for deletion by the Node Controller. The default eviction timeout duration is **five minutes**. In some cases when the node is unreachable, the apiserver is unable to communicate with the kubelet on the node. The decision to delete the pods cannot be communicated to the kubelet until communication with the apiserver is re-established. In the meantime, the pods that are scheduled for deletion may continue to run on the partitioned node.
|
||||
|
||||
In versions of Kubernetes prior to 1.5, the node controller would [force delete](/docs/concepts/workloads/pods/pod/#force-deletion-of-pods)
|
||||
these unreachable pods from the apiserver. However, in 1.5 and higher, the node controller does not force delete pods until it is
|
||||
|
@ -83,8 +83,8 @@ Kubernetes causes all the Pod objects running on the node to be deleted from the
|
|||
|
||||
The node lifecycle controller automatically creates
|
||||
[taints](/docs/concepts/configuration/taint-and-toleration/) that represent conditions.
|
||||
When the scheduler is assigning a Pod to a Node, the scheduler takes the Node's taints
|
||||
into account, except for any taints that the Pod tolerates.
|
||||
The scheduler takes the Node's taints into consideration when assigning a Pod to a Node.
|
||||
Pods can also have tolerations which let them tolerate a Node's taints.
|
||||
|
||||
### Capacity and Allocatable {#capacity}
|
||||
|
||||
|
@ -131,6 +131,8 @@ Kubernetes creates a node object internally (the representation), and
|
|||
validates the node by health checking based on the `metadata.name` field. If the node is valid -- that is, if all necessary
|
||||
services are running -- it is eligible to run a pod. Otherwise, it is
|
||||
ignored for any cluster activity until it becomes valid.
|
||||
The name of a Node object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
{{< note >}}
|
||||
Kubernetes keeps the object for the invalid node and keeps checking to see whether it becomes valid.
|
||||
|
@ -157,7 +159,7 @@ controller deletes the node from its list of nodes.
|
|||
The third is monitoring the nodes' health. The node controller is
|
||||
responsible for updating the NodeReady condition of NodeStatus to
|
||||
ConditionUnknown when a node becomes unreachable (i.e. the node controller stops
|
||||
receiving heartbeats for some reason, e.g. due to the node being down), and then later evicting
|
||||
receiving heartbeats for some reason, for example due to the node being down), and then later evicting
|
||||
all the pods from the node (using graceful termination) if the node continues
|
||||
to be unreachable. (The default timeouts are 40s to start reporting
|
||||
ConditionUnknown and 5m after that to start evicting pods.) The node controller
|
||||
|
@ -182,13 +184,13 @@ a Lease object.
|
|||
timeout for unreachable nodes).
|
||||
- The kubelet creates and then updates its Lease object every 10 seconds
|
||||
(the default update interval). Lease updates occur independently from the
|
||||
`NodeStatus` updates.
|
||||
`NodeStatus` updates. If the Lease update fails, the kubelet retries with exponential backoff starting at 200 milliseconds and capped at 7 seconds.
|
||||
|
||||
#### Reliability
|
||||
|
||||
In Kubernetes 1.4, we updated the logic of the node controller to better handle
|
||||
cases when a large number of nodes have problems with reaching the master
|
||||
(e.g. because the master has networking problem). Starting with 1.4, the node
|
||||
(e.g. because the master has networking problems). Starting with 1.4, the node
|
||||
controller looks at the state of all nodes in the cluster when making a
|
||||
decision about pod eviction.
|
||||
|
||||
|
@ -212,9 +214,9 @@ there is only one availability zone (the whole cluster).
|
|||
|
||||
A key reason for spreading your nodes across availability zones is so that the
|
||||
workload can be shifted to healthy zones when one entire zone goes down.
|
||||
Therefore, if all nodes in a zone are unhealthy then node controller evicts at
|
||||
the normal rate `--node-eviction-rate`. The corner case is when all zones are
|
||||
completely unhealthy (i.e. there are no healthy nodes in the cluster). In such
|
||||
Therefore, if all nodes in a zone are unhealthy then the node controller evicts at
|
||||
the normal rate of `--node-eviction-rate`. The corner case is when all zones are
|
||||
completely unhealthy (i.e. there are no healthy nodes in the cluster). In such a
|
||||
case, the node controller assumes that there's some problem with master
|
||||
connectivity and stops all evictions until some connectivity is restored.
|
||||
|
||||
|
@ -275,6 +277,12 @@ and do not respect the unschedulable attribute on a node. This assumes that daem
|
|||
the machine even if it is being drained of applications while it prepares for a reboot.
|
||||
{{< /note >}}
|
||||
|
||||
{{< caution >}}
|
||||
`kubectl cordon` marks a node as 'unschedulable', which has the side effect of the service
|
||||
controller removing the node from any LoadBalancer node target lists it was previously
|
||||
eligible for, effectively removing incoming load balancer traffic from the cordoned node(s).
|
||||
{{< /caution >}}
|
||||
|
||||
### Node capacity
|
||||
|
||||
The capacity of the node (number of cpus and amount of memory) is part of the node object.
|
||||
|
|
|
@ -28,7 +28,7 @@ Add-ons in each section are sorted alphabetically - the ordering does not imply
|
|||
* [Contiv](http://contiv.github.io) provides configurable networking (native L3 using BGP, overlay using vxlan, classic L2, and Cisco-SDN/ACI) for various use cases and a rich policy framework. Contiv project is fully [open sourced](http://github.com/contiv). The [installer](http://github.com/contiv/install) provides both kubeadm and non-kubeadm based installation options.
|
||||
* [Contrail](http://www.juniper.net/us/en/products-services/sdn/contrail/contrail-networking/), based on [Tungsten Fabric](https://tungsten.io), is an open source, multi-cloud network virtualization and policy management platform. Contrail and Tungsten Fabric are integrated with orchestration systems such as Kubernetes, OpenShift, OpenStack and Mesos, and provide isolation modes for virtual machines, containers/pods and bare metal workloads.
|
||||
* [Flannel](https://github.com/coreos/flannel/blob/master/Documentation/kubernetes.md) is an overlay network provider that can be used with Kubernetes.
|
||||
* [Knitter](https://github.com/ZTE/Knitter/) is a network solution supporting multiple networking in Kubernetes.
|
||||
* [Knitter](https://github.com/ZTE/Knitter/) is a plugin to support multiple network interfaces in a Kubernetes pod.
|
||||
* [Multus](https://github.com/Intel-Corp/multus-cni) is a Multi plugin for multiple network support in Kubernetes to support all CNI plugins (e.g. Calico, Cilium, Contiv, Flannel), in addition to SRIOV, DPDK, OVS-DPDK and VPP based workloads in Kubernetes.
|
||||
* [NSX-T](https://docs.vmware.com/en/VMware-NSX-T/2.0/nsxt_20_ncp_kubernetes.pdf) Container Plug-in (NCP) provides integration between VMware NSX-T and container orchestrators such as Kubernetes, as well as integration between NSX-T and container-based CaaS/PaaS platforms such as Pivotal Container Service (PKS) and OpenShift.
|
||||
* [Nuage](https://github.com/nuagenetworks/nuage-kubernetes/blob/v5.1.1-1/docs/kubernetes-1-installation.rst) is an SDN platform that provides policy-based networking between Kubernetes Pods and non-Kubernetes environments with visibility and security monitoring.
|
||||
|
@ -46,7 +46,7 @@ Add-ons in each section are sorted alphabetically - the ordering does not imply
|
|||
|
||||
## Infrastructure
|
||||
|
||||
* [KubeVirt](https://kubevirt.io/user-guide/docs/latest/administration/intro.html#cluster-side-add-on-deployment) is an add-on to run virtual machines on Kubernetes. Usually run on bare-metal clusters.
|
||||
* [KubeVirt](https://kubevirt.io/user-guide/#/installation/installation) is an add-on to run virtual machines on Kubernetes. Usually run on bare-metal clusters.
|
||||
|
||||
## Legacy Add-ons
|
||||
|
||||
|
|
|
@ -130,11 +130,11 @@ Finally, add the same parameters into the API server start parameters.
|
|||
Note that you may need to adapt the sample commands based on the hardware
|
||||
architecture and cfssl version you are using.
|
||||
|
||||
curl -L https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 -o cfssl
|
||||
curl -L https://github.com/cloudflare/cfssl/releases/download/v1.4.1/cfssl_1.4.1_linux_amd64 -o cfssl
|
||||
chmod +x cfssl
|
||||
curl -L https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 -o cfssljson
|
||||
curl -L https://github.com/cloudflare/cfssl/releases/download/v1.4.1/cfssljson_1.4.1_linux_amd64 -o cfssljson
|
||||
chmod +x cfssljson
|
||||
curl -L https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64 -o cfssl-certinfo
|
||||
curl -L https://github.com/cloudflare/cfssl/releases/download/v1.4.1/cfssl-certinfo_1.4.1_linux_amd64 -o cfssl-certinfo
|
||||
chmod +x cfssl-certinfo
|
||||
1. Create a directory to hold the artifacts and initialize cfssl:
|
||||
|
||||
|
|
|
@ -94,7 +94,7 @@ Different settings can be applied to a load balancer service in AWS using _annot
|
|||
* `service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix`: Used to specify access log s3 bucket prefix.
|
||||
* `service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags`: Used on the service to specify a comma-separated list of key-value pairs which will be recorded as additional tags in the ELB. For example: `"Key1=Val1,Key2=Val2,KeyNoVal1=,KeyNoVal2"`.
|
||||
* `service.beta.kubernetes.io/aws-load-balancer-backend-protocol`: Used on the service to specify the protocol spoken by the backend (pod) behind a listener. If `http` (default) or `https`, an HTTPS listener that terminates the connection and parses headers is created. If set to `ssl` or `tcp`, a "raw" SSL listener is used. If set to `http` and `aws-load-balancer-ssl-cert` is not used then a HTTP listener is used.
|
||||
* `service.beta.kubernetes.io/aws-load-balancer-ssl-cert`: Used on the service to request a secure listener. Value is a valid certificate ARN. For more, see [ELB Listener Config](http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-listener-config.html) CertARN is an IAM or CM certificate ARN, e.g. `arn:aws:acm:us-east-1:123456789012:certificate/12345678-1234-1234-1234-123456789012`.
|
||||
* `service.beta.kubernetes.io/aws-load-balancer-ssl-cert`: Used on the service to request a secure listener. Value is a valid certificate ARN. For more, see [ELB Listener Config](http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-listener-config.html) CertARN is an IAM or CM certificate ARN, for example `arn:aws:acm:us-east-1:123456789012:certificate/12345678-1234-1234-1234-123456789012`.
|
||||
* `service.beta.kubernetes.io/aws-load-balancer-connection-draining-enabled`: Used on the service to enable or disable connection draining.
|
||||
* `service.beta.kubernetes.io/aws-load-balancer-connection-draining-timeout`: Used on the service to specify a connection draining timeout.
|
||||
* `service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout`: Used on the service to specify the idle connection timeout.
|
||||
|
|
|
@ -20,7 +20,6 @@ See the guides in [Setup](/docs/setup/) for examples of how to plan, set up, and
|
|||
Before choosing a guide, here are some considerations:
|
||||
|
||||
- Do you just want to try out Kubernetes on your computer, or do you want to build a high-availability, multi-node cluster? Choose distros best suited for your needs.
|
||||
- **If you are designing for high-availability**, learn about configuring [clusters in multiple zones](/docs/concepts/cluster-administration/federation/).
|
||||
- Will you be using **a hosted Kubernetes cluster**, such as [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/), or **hosting your own cluster**?
|
||||
- Will your cluster be **on-premises**, or **in the cloud (IaaS)**? Kubernetes does not directly support hybrid clusters. Instead, you can set up multiple clusters.
|
||||
- **If you are configuring Kubernetes on-premises**, consider which [networking model](/docs/concepts/cluster-administration/networking/) fits best.
|
||||
|
@ -44,7 +43,7 @@ Note: Not all distros are actively maintained. Choose distros which have been te
|
|||
|
||||
* [Certificates](/docs/concepts/cluster-administration/certificates/) describes the steps to generate certificates using different tool chains.
|
||||
|
||||
* [Kubernetes Container Environment](/docs/concepts/containers/container-environment-variables/) describes the environment for Kubelet managed containers on a Kubernetes node.
|
||||
* [Kubernetes Container Environment](/docs/concepts/containers/container-environment/) describes the environment for Kubelet managed containers on a Kubernetes node.
|
||||
|
||||
* [Controlling Access to the Kubernetes API](/docs/reference/access-authn-authz/controlling-access/) describes how to set up permissions for users and service accounts.
|
||||
|
||||
|
|
|
@ -1,50 +0,0 @@
|
|||
---
|
||||
title: Controller manager metrics
|
||||
content_template: templates/concept
|
||||
weight: 100
|
||||
---
|
||||
|
||||
{{% capture overview %}}
|
||||
Controller manager metrics provide important insight into the performance and health of
|
||||
the controller manager.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture body %}}
|
||||
## What are controller manager metrics
|
||||
|
||||
Controller manager metrics provide important insight into the performance and health of the controller manager.
|
||||
These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as
|
||||
etcd request latencies or Cloudprovider (AWS, GCE, OpenStack) API latencies that can be used
|
||||
to gauge the health of a cluster.
|
||||
|
||||
Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and OpenStack.
|
||||
These metrics can be used to monitor health of persistent volume operations.
|
||||
|
||||
For example, for GCE these metrics are called:
|
||||
|
||||
```
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Configuration
|
||||
|
||||
|
||||
In a cluster, controller-manager metrics are available from `http://localhost:10252/metrics`
|
||||
from the host where the controller-manager is running.
|
||||
|
||||
The metrics are emitted in [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and are human readable.
|
||||
|
||||
In a production environment you may want to configure prometheus or some other metrics scraper
|
||||
to periodically gather these metrics and make them available in some kind of time series database.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
|
|
@ -1,186 +0,0 @@
|
|||
---
|
||||
title: Federation
|
||||
content_template: templates/concept
|
||||
weight: 80
|
||||
---
|
||||
|
||||
{{% capture overview %}}
|
||||
|
||||
{{< deprecationfilewarning >}}
|
||||
{{< include "federation-deprecation-warning-note.md" >}}
|
||||
{{< /deprecationfilewarning >}}
|
||||
|
||||
This page explains why and how to manage multiple Kubernetes clusters using
|
||||
federation.
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture body %}}
|
||||
## Why federation
|
||||
|
||||
Federation makes it easy to manage multiple clusters. It does so by providing 2
|
||||
major building blocks:
|
||||
|
||||
* Sync resources across clusters: Federation provides the ability to keep
|
||||
resources in multiple clusters in sync. For example, you can ensure that the same deployment exists in multiple clusters.
|
||||
* Cross cluster discovery: Federation provides the ability to auto-configure DNS servers and load balancers with backends from all clusters. For example, you can ensure that a global VIP or DNS record can be used to access backends from multiple clusters.
|
||||
|
||||
Some other use cases that federation enables are:
|
||||
|
||||
* High Availability: By spreading load across clusters and auto configuring DNS
|
||||
servers and load balancers, federation minimises the impact of cluster
|
||||
failure.
|
||||
* Avoiding provider lock-in: By making it easier to migrate applications across
|
||||
clusters, federation prevents cluster provider lock-in.
|
||||
|
||||
|
||||
Federation is not helpful unless you have multiple clusters. Some of the reasons
|
||||
why you might want multiple clusters are:
|
||||
|
||||
* Low latency: Having clusters in multiple regions minimises latency by serving
|
||||
users from the cluster that is closest to them.
|
||||
* Fault isolation: It might be better to have multiple small clusters rather
|
||||
than a single large cluster for fault isolation (for example: multiple
|
||||
clusters in different availability zones of a cloud provider).
|
||||
* Scalability: There are scalability limits to a single kubernetes cluster (this
|
||||
should not be the case for most users. For more details:
|
||||
[Kubernetes Scaling and Performance Goals](https://git.k8s.io/community/sig-scalability/goals.md)).
|
||||
* [Hybrid cloud](#hybrid-cloud-capabilities): You can have multiple clusters on different cloud providers or
|
||||
on-premises data centers.
|
||||
|
||||
### Caveats
|
||||
|
||||
While there are a lot of attractive use cases for federation, there are also
|
||||
some caveats:
|
||||
|
||||
* Increased network bandwidth and cost: The federation control plane watches all
|
||||
clusters to ensure that the current state is as expected. This can lead to
|
||||
significant network cost if the clusters are running in different regions on
|
||||
a cloud provider or on different cloud providers.
|
||||
* Reduced cross cluster isolation: A bug in the federation control plane can
|
||||
impact all clusters. This is mitigated by keeping the logic in federation
|
||||
control plane to a minimum. It mostly delegates to the control plane in
|
||||
kubernetes clusters whenever it can. The design and implementation also errs
|
||||
on the side of safety and avoiding multi-cluster outage.
|
||||
* Maturity: The federation project is relatively new and is not very mature.
|
||||
Not all resources are available and many are still alpha. [Issue
|
||||
88](https://github.com/kubernetes/federation/issues/88) enumerates
|
||||
known issues with the system that the team is busy solving.
|
||||
|
||||
### Hybrid cloud capabilities
|
||||
|
||||
Federations of Kubernetes Clusters can include clusters running in
|
||||
different cloud providers (e.g. Google Cloud, AWS), and on-premises
|
||||
(e.g. on OpenStack). [Kubefed](/docs/tasks/federation/set-up-cluster-federation-kubefed/) is the recommended way to deploy federated clusters.
|
||||
|
||||
Thereafter, your [API resources](#api-resources) can span different clusters
|
||||
and cloud providers.
|
||||
|
||||
## Setting up federation
|
||||
|
||||
To be able to federate multiple clusters, you first need to set up a federation
|
||||
control plane.
|
||||
Follow the [setup guide](/docs/tutorials/federation/set-up-cluster-federation-kubefed/) to set up the
|
||||
federation control plane.
|
||||
|
||||
## API resources
|
||||
|
||||
Once you have the control plane set up, you can start creating federation API
|
||||
resources.
|
||||
The following guides explain some of the resources in detail:
|
||||
|
||||
* [Cluster](/docs/tasks/federation/administer-federation/cluster/)
|
||||
* [ConfigMap](/docs/tasks/federation/administer-federation/configmap/)
|
||||
* [DaemonSets](/docs/tasks/federation/administer-federation/daemonset/)
|
||||
* [Deployment](/docs/tasks/federation/administer-federation/deployment/)
|
||||
* [Events](/docs/tasks/federation/administer-federation/events/)
|
||||
* [Hpa](/docs/tasks/federation/administer-federation/hpa/)
|
||||
* [Ingress](/docs/tasks/federation/administer-federation/ingress/)
|
||||
* [Jobs](/docs/tasks/federation/administer-federation/job/)
|
||||
* [Namespaces](/docs/tasks/federation/administer-federation/namespaces/)
|
||||
* [ReplicaSets](/docs/tasks/federation/administer-federation/replicaset/)
|
||||
* [Secrets](/docs/tasks/federation/administer-federation/secret/)
|
||||
* [Services](/docs/concepts/cluster-administration/federation-service-discovery/)
|
||||
|
||||
|
||||
The [API reference docs](/docs/reference/federation/) list all the
|
||||
resources supported by federation apiserver.
|
||||
|
||||
## Cascading deletion
|
||||
|
||||
Kubernetes version 1.6 includes support for cascading deletion of federated
|
||||
resources. With cascading deletion, when you delete a resource from the
|
||||
federation control plane, you also delete the corresponding resources in all underlying clusters.
|
||||
|
||||
Cascading deletion is not enabled by default when using the REST API. To enable
|
||||
it, set the option `DeleteOptions.orphanDependents=false` when you delete a
|
||||
resource from the federation control plane using the REST API. Using `kubectl
|
||||
delete`
|
||||
enables cascading deletion by default. You can disable it by running `kubectl
|
||||
delete --cascade=false`
|
||||
|
||||
Note: Kubernetes version 1.5 included cascading deletion support for a subset of
|
||||
federation resources.
|
||||
|
||||
## Scope of a single cluster
|
||||
|
||||
On IaaS providers such as Google Compute Engine or Amazon Web Services, a VM exists in a
|
||||
[zone](https://cloud.google.com/compute/docs/zones) or [availability
|
||||
zone](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html).
|
||||
We suggest that all the VMs in a Kubernetes cluster should be in the same availability zone, because:
|
||||
|
||||
- compared to having a single global Kubernetes cluster, there are fewer single-points of failure.
|
||||
- compared to a cluster that spans availability zones, it is easier to reason about the availability properties of a
|
||||
single-zone cluster.
|
||||
- when the Kubernetes developers are designing the system (e.g. making assumptions about latency, bandwidth, or
|
||||
correlated failures) they are assuming all the machines are in a single data center, or otherwise closely connected.
|
||||
|
||||
It is recommended to run fewer clusters with more VMs per availability zone; but it is possible to run multiple clusters per availability zones.
|
||||
|
||||
Reasons to prefer fewer clusters per availability zone are:
|
||||
|
||||
- improved bin packing of Pods in some cases with more nodes in one cluster (less resource fragmentation).
|
||||
- reduced operational overhead (though the advantage is diminished as ops tooling and processes mature).
|
||||
- reduced costs for per-cluster fixed resource costs, e.g. apiserver VMs (but small as a percentage
|
||||
of overall cluster cost for medium to large clusters).
|
||||
|
||||
Reasons to have multiple clusters include:
|
||||
|
||||
- strict security policies requiring isolation of one class of work from another (but, see Partitioning Clusters
|
||||
below).
|
||||
- test clusters to canary new Kubernetes releases or other cluster software.
|
||||
|
||||
## Selecting the right number of clusters
|
||||
|
||||
The selection of the number of Kubernetes clusters may be a relatively static choice, only revisited occasionally.
|
||||
By contrast, the number of nodes in a cluster and the number of pods in a service may change frequently according to
|
||||
load and growth.
|
||||
|
||||
To pick the number of clusters, first, decide which regions you need to be in to have adequate latency to all your end users, for services that will run
|
||||
on Kubernetes (if you use a Content Distribution Network, the latency requirements for the CDN-hosted content need not
|
||||
be considered). Legal issues might influence this as well. For example, a company with a global customer base might decide to have clusters in US, EU, AP, and SA regions.
|
||||
Call the number of regions to be in `R`.
|
||||
|
||||
Second, decide how many clusters should be able to be unavailable at the same time, while still being available. Call
|
||||
the number that can be unavailable `U`. If you are not sure, then 1 is a fine choice.
|
||||
|
||||
If it is allowable for load-balancing to direct traffic to any region in the event of a cluster failure, then
|
||||
you need at least the larger of `R` or `U + 1` clusters. If it is not (e.g. you want to ensure low latency for all
|
||||
users in the event of a cluster failure), then you need to have `R * (U + 1)` clusters
|
||||
(`U + 1` in each of `R` regions). In any case, try to put each cluster in a different zone.
|
||||
|
||||
Finally, if any of your clusters would need more than the maximum recommended number of nodes for a Kubernetes cluster, then
|
||||
you may need even more clusters. Kubernetes v1.3 supports clusters up to 1000 nodes in size. Kubernetes v1.8 supports
|
||||
clusters up to 5000 nodes. See [Building Large Clusters](/docs/setup/best-practices/cluster-large/) for more guidance.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture whatsnext %}}
|
||||
* Learn more about the [Federation
|
||||
proposal](https://github.com/kubernetes/community/blob/{{< param "githubbranch" >}}/contributors/design-proposals/multicluster/federation.md).
|
||||
* See this [setup guide](/docs/tutorials/federation/set-up-cluster-federation-kubefed/) for cluster federation.
|
||||
* See this [Kubecon2016 talk on federation](https://www.youtube.com/watch?v=pq9lbkmxpS8)
|
||||
* See this [Kubecon2017 Europe update on federation](https://www.youtube.com/watch?v=kwOvOLnFYck)
|
||||
* See this [Kubecon2018 Europe update on sig-multicluster](https://www.youtube.com/watch?v=vGZo5DaThQU)
|
||||
* See this [Kubecon2018 Europe Federation-v2 prototype presentation](https://youtu.be/q27rbaX5Jis?t=7m20s)
|
||||
* See this [Federation-v2 Userguide](https://github.com/kubernetes-sigs/federation-v2/blob/master/docs/userguide.md)
|
||||
{{% /capture %}}
|
|
@ -0,0 +1,377 @@
|
|||
---
|
||||
title: API Priority and Fairness
|
||||
content_template: templates/concept
|
||||
min-kubernetes-server-version: v1.18
|
||||
---
|
||||
|
||||
{{% capture overview %}}
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.18" >}}
|
||||
|
||||
Controlling the behavior of the Kubernetes API server in an overload situation
|
||||
is a key task for cluster administrators. The {{< glossary_tooltip
|
||||
term_id="kube-apiserver" text="kube-apiserver" >}} has some controls available
|
||||
(i.e. the `--max-requests-inflight` and `--max-mutating-requests-inflight`
|
||||
command-line flags) to limit the amount of outstanding work that will be
|
||||
accepted, preventing a flood of inbound requests from overloading and
|
||||
potentially crashing the API server, but these flags are not enough to ensure
|
||||
that the most important requests get through in a period of high traffic.
|
||||
|
||||
The API Priority and Fairness feature (APF) is an alternative that improves upon
|
||||
aforementioned max-inflight limitations. APF classifies
|
||||
and isolates requests in a more fine-grained way. It also introduces
|
||||
a limited amount of queuing, so that no requests are rejected in cases
|
||||
of very brief bursts. Requests are dispatched from queues using a
|
||||
fair queuing technique so that, for example, a poorly-behaved {{<
|
||||
glossary_tooltip text="controller" term_id="controller" >}}) need not
|
||||
starve others (even at the same priority level).
|
||||
|
||||
{{< caution >}}
|
||||
Requests classified as "long-running" — primarily watches — are not
|
||||
subject to the API Priority and Fairness filter. This is also true for
|
||||
the `--max-requests-inflight` flag without the API Priority and
|
||||
Fairness feature enabled.
|
||||
{{< /caution >}}
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture body %}}
|
||||
|
||||
## Enabling API Priority and Fairness
|
||||
|
||||
The API Priority and Fairness feature is controlled by a feature gate
|
||||
and is not enabled by default. See
|
||||
[Feature Gates](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
for a general explanation of feature gates and how to enable and disable them. The
|
||||
name of the feature gate for APF is "APIPriorityAndFairness". This
|
||||
feature also involves an {{< glossary_tooltip term_id="api-group"
|
||||
text="API Group" >}} that must be enabled. You can do these
|
||||
things by adding the following command-line flags to your
|
||||
`kube-apiserver` invocation:
|
||||
|
||||
```shell
|
||||
kube-apiserver \
|
||||
--feature-gates=APIPriorityAndFairness=true \
|
||||
--runtime-config=flowcontrol.apiserver.k8s.io/v1alpha1=true \
|
||||
# …and other flags as usual
|
||||
```
|
||||
|
||||
The command-line flag `--enable-priority-and-fairness=false` will disable the
|
||||
API Priority and Fairness feature, even if other flags have enabled it.
|
||||
|
||||
## Concepts
|
||||
There are several distinct features involved in the API Priority and Fairness
|
||||
feature. Incoming requests are classified by attributes of the request using
|
||||
_FlowSchemas_, and assigned to priority levels. Priority levels add a degree of
|
||||
isolation by maintaining separate concurrency limits, so that requests assigned
|
||||
to different priority levels cannot starve each other. Within a priority level,
|
||||
a fair-queuing algorithm prevents requests from different _flows_ from starving
|
||||
each other, and allows for requests to be queued to prevent bursty traffic from
|
||||
causing failed requests when the average load is acceptably low.
|
||||
|
||||
### Priority Levels
|
||||
Without APF enabled, overall concurrency in
|
||||
the API server is limited by the `kube-apiserver` flags
|
||||
`--max-requests-inflight` and `--max-mutating-requests-inflight`. With APF
|
||||
enabled, the concurrency limits defined by these flags are summed and then the sum is divided up
|
||||
among a configurable set of _priority levels_. Each incoming request is assigned
|
||||
to a single priority level, and each priority level will only dispatch as many
|
||||
concurrent requests as its configuration allows.
|
||||
|
||||
The default configuration, for example, includes separate priority levels for
|
||||
leader-election requests, requests from built-in controllers, and requests from
|
||||
Pods. This means that an ill-behaved Pod that floods the API server with
|
||||
requests cannot prevent leader election or actions by the built-in controllers
|
||||
from succeeding.
|
||||
|
||||
### Queuing
|
||||
Even within a priority level there may be a large number of distinct sources of
|
||||
traffic. In an overload situation, it is valuable to prevent one stream of
|
||||
requests from starving others (in particular, in the relatively common case of a
|
||||
single buggy client flooding the kube-apiserver with requests, that buggy client
|
||||
would ideally not have much measurable impact on other clients at all). This is
|
||||
handled by use of a fair-queuing algorithm to process requests that are assigned
|
||||
the same priority level. Each request is assigned to a _flow_, identified by the
|
||||
name of the matching FlowSchema plus a _flow distinguisher_ — which
|
||||
is either the requesting user, the target resource's namespace, or nothing — and the
|
||||
system attempts to give approximately equal weight to requests in different
|
||||
flows of the same priority level.
|
||||
|
||||
After classifying a request into a flow, the API Priority and Fairness
|
||||
feature then may assign the request to a queue. This assignment uses
|
||||
a technique known as {{< glossary_tooltip term_id="shuffle-sharding"
|
||||
text="shuffle sharding" >}}, which makes relatively efficient use of
|
||||
queues to insulate low-intensity flows from high-intensity flows.
|
||||
|
||||
The details of the queuing algorithm are tunable for each priority level, and
|
||||
allow administrators to trade off memory use, fairness (the property that
|
||||
independent flows will all make progress when total traffic exceeds capacity),
|
||||
tolerance for bursty traffic, and the added latency induced by queuing.
|
||||
|
||||
### Exempt requests
|
||||
Some requests are considered sufficiently important that they are not subject to
|
||||
any of the limitations imposed by this feature. These exemptions prevent an
|
||||
improperly-configured flow control configuration from totally disabling an API
|
||||
server.
|
||||
|
||||
## Defaults
|
||||
The Priority and Fairness feature ships with a suggested configuration that
|
||||
should suffice for experimentation; if your cluster is likely to
|
||||
experience heavy load then you should consider what configuration will work best. The suggested configuration groups requests into five priority
|
||||
classes:
|
||||
|
||||
* The `system` priority level is for requests from the `system:nodes` group,
|
||||
i.e. Kubelets, which must be able to contact the API server in order for
|
||||
workloads to be able to schedule on them.
|
||||
|
||||
* The `leader-election` priority level is for leader election requests from
|
||||
built-in controllers (in particular, requests for `endpoints`, `configmaps`,
|
||||
or `leases` coming from the `system:kube-controller-manager` or
|
||||
`system:kube-scheduler` users and service accounts in the `kube-system`
|
||||
namespace). These are important to isolate from other traffic because failures
|
||||
in leader election cause their controllers to fail and restart, which in turn
|
||||
causes more expensive traffic as the new controllers sync their informers.
|
||||
|
||||
* The `workload-high` priority level is for other requests from built-in
|
||||
controllers.
|
||||
|
||||
* The `workload-low` priority level is for requests from any other service
|
||||
account, which will typically include all requests from controllers runing in
|
||||
Pods.
|
||||
|
||||
* The `global-default` priority level handles all other traffic, e.g.
|
||||
interactive `kubectl` commands run by nonprivileged users.
|
||||
|
||||
Additionally, there are two PriorityLevelConfigurations and two FlowSchemas that
|
||||
are built in and may not be overwritten:
|
||||
|
||||
* The special `exempt` priority level is used for requests that are not subject
|
||||
to flow control at all: they will always be dispatched immediately. The
|
||||
special `exempt` FlowSchema classifies all requests from the `system:masters`
|
||||
group into this priority level. You may define other FlowSchemas that direct
|
||||
other requests to this priority level, if appropriate.
|
||||
|
||||
* The special `catch-all` priority level is used in combination with the special
|
||||
`catch-all` FlowSchema to make sure that every request gets some kind of
|
||||
classification. Typically you should not rely on this catch-all configuration,
|
||||
and should create your own catch-all FlowSchema and PriorityLevelConfiguration
|
||||
(or use the `global-default` configuration that is installed by default) as
|
||||
appropriate. To help catch configuration errors that miss classifying some
|
||||
requests, the mandatory `catch-all` priority level only allows one concurrency
|
||||
share and does not queue requests, making it relatively likely that traffic
|
||||
that only matches the `catch-all` FlowSchema will be rejected with an HTTP 429
|
||||
error.
|
||||
|
||||
## Resources
|
||||
The flow control API involves two kinds of resources.
|
||||
[PriorityLevelConfigurations](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#prioritylevelconfiguration-v1alpha1-flowcontrol)
|
||||
define the available isolation classes, the share of the available concurrency
|
||||
budget that each can handle, and allow for fine-tuning queuing behavior.
|
||||
[FlowSchemas](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#flowschema-v1alpha1-flowcontrol)
|
||||
are used to classify individual inbound requests, matching each to a single
|
||||
PriorityLevelConfiguration.
|
||||
|
||||
### PriorityLevelConfiguration
|
||||
A PriorityLevelConfiguration represents a single isolation class. Each
|
||||
PriorityLevelConfiguration has an independent limit on the number of outstanding
|
||||
requests, and limitations on the number of queued requests.
|
||||
|
||||
Concurrency limits for PriorityLevelConfigurations are not specified in absolute
|
||||
number of requests, but rather in "concurrency shares." The total concurrency
|
||||
limit for the API Server is distributed among the existing
|
||||
PriorityLevelConfigurations in proportion with these shares. This allows a
|
||||
cluster administrator to scale up or down the total amount of traffic to a
|
||||
server by restarting `kube-apiserver` with a different value for
|
||||
`--max-requests-inflight` (or `--max-mutating-requests-inflight`), and all
|
||||
PriorityLevelConfigurations will see their maximum allowed concurrency go up (or
|
||||
down) by the same fraction.
|
||||
{{< caution >}}
|
||||
With the Priority and Fairness feature enabled, the total concurrency limit for
|
||||
the server is set to the sum of `--max-requests-inflight` and
|
||||
`--max-mutating-requests-inflight`. There is no longer any distinction made
|
||||
between mutating and non-mutating requests; if you want to treat them
|
||||
separately for a given resource, make separate FlowSchemas that match the
|
||||
mutating and non-mutating verbs respectively.
|
||||
{{< /caution >}}
|
||||
|
||||
When the volume of inbound requests assigned to a single
|
||||
PriorityLevelConfiguration is more than its permitted concurrency level, the
|
||||
`type` field of its specification determines what will happen to extra requests.
|
||||
A type of `Reject` means that excess traffic will immediately be rejected with
|
||||
an HTTP 429 (Too Many Requests) error. A type of `Queue` means that requests
|
||||
above the threshold will be queued, with the shuffle sharding and fair queuing techniques used
|
||||
to balance progress between request flows.
|
||||
|
||||
The queuing configuration allows tuning the fair queuing algorithm for a
|
||||
priority level. Details of the algorithm can be read in the [enhancement
|
||||
proposal](#what-s-next), but in short:
|
||||
|
||||
* Increasing `queues` reduces the rate of collisions between different flows, at
|
||||
the cost of increased memory usage. A value of 1 here effectively disables the
|
||||
fair-queuing logic, but still allows requests to be queued.
|
||||
|
||||
* Increasing `queueLengthLimit` allows larger bursts of traffic to be
|
||||
sustained without dropping any requests, at the cost of increased
|
||||
latency and memory usage.
|
||||
|
||||
* Changing `handSize` allows you to adjust the probability of collisions between
|
||||
different flows and the overall concurrency available to a single flow in an
|
||||
overload situation.
|
||||
{{< note >}}
|
||||
A larger `handSize` makes it less likely for two individual flows to collide
|
||||
(and therefore for one to be able to starve the other), but more likely that
|
||||
a small number of flows can dominate the apiserver. A larger `handSize` also
|
||||
potentially increases the amount of latency that a single high-traffic flow
|
||||
can cause. The maximum number of queued requests possible from a
|
||||
single flow is `handSize * queueLengthLimit`.
|
||||
{{< /note >}}
|
||||
|
||||
|
||||
Following is a table showing an interesting collection of shuffle
|
||||
sharding configurations, showing for each the probability that a
|
||||
given mouse (low-intensity flow) is squished by the elephants (high-intensity flows) for
|
||||
an illustrative collection of numbers of elephants. See
|
||||
https://play.golang.org/p/Gi0PLgVHiUg , which computes this table.
|
||||
|
||||
{{< table caption="Example Shuffle Sharding Configurations" >}}
|
||||
|HandSize| Queues| 1 elephant| 4 elephants| 16 elephants|
|
||||
|--------|-----------|------------|----------------|--------------------|
|
||||
| 12| 32| 4.428838398950118e-09| 0.11431348830099144| 0.9935089607656024|
|
||||
| 10| 32| 1.550093439632541e-08| 0.0626479840223545| 0.9753101519027554|
|
||||
| 10| 64| 6.601827268370426e-12| 0.00045571320990370776| 0.49999929150089345|
|
||||
| 9| 64| 3.6310049976037345e-11| 0.00045501212304112273| 0.4282314876454858|
|
||||
| 8| 64| 2.25929199850899e-10| 0.0004886697053040446| 0.35935114681123076|
|
||||
| 8| 128| 6.994461389026097e-13| 3.4055790161620863e-06| 0.02746173137155063|
|
||||
| 7| 128| 1.0579122850901972e-11| 6.960839379258192e-06| 0.02406157386340147|
|
||||
| 7| 256| 7.597695465552631e-14| 6.728547142019406e-08| 0.0006709661542533682|
|
||||
| 6| 256| 2.7134626662687968e-12| 2.9516464018476436e-07| 0.0008895654642000348|
|
||||
| 6| 512| 4.116062922897309e-14| 4.982983350480894e-09| 2.26025764343413e-05|
|
||||
| 6| 1024| 6.337324016514285e-16| 8.09060164312957e-11| 4.517408062903668e-07|
|
||||
|
||||
### FlowSchema
|
||||
|
||||
A FlowSchema matches some inbound requests and assigns them to a
|
||||
priority level. Every inbound request is tested against every
|
||||
FlowSchema in turn, starting with those with numerically lowest ---
|
||||
which we take to be the logically highest --- `matchingPrecedence` and
|
||||
working onward. The first match wins.
|
||||
|
||||
{{< caution >}}
|
||||
Only the first matching FlowSchema for a given request matters. If multiple
|
||||
FlowSchemas match a single inbound request, it will be assigned based on the one
|
||||
with the highest `matchingPrecedence`. If multiple FlowSchemas with equal
|
||||
`matchingPrecedence` match the same request, the one with lexicographically
|
||||
smaller `name` will win, but it's better not to rely on this, and instead to
|
||||
ensure that no two FlowSchemas have the same `matchingPrecedence`.
|
||||
{{< /caution >}}
|
||||
|
||||
A FlowSchema matches a given request if at least one of its `rules`
|
||||
matches. A rule matches if at least one of its `subjects` *and* at least
|
||||
one of its `resourceRules` or `nonResourceRules` (depending on whether the
|
||||
incoming request is for a resource or non-resource URL) matches the request.
|
||||
|
||||
For the `name` field in subjects, and the `verbs`, `apiGroups`, `resources`,
|
||||
`namespaces`, and `nonResourceURLs` fields of resource and non-resource rules,
|
||||
the wildcard `*` may be specified to match all values for the given field,
|
||||
effectively removing it from consideration.
|
||||
|
||||
A FlowSchema's `distinguisherMethod.type` determines how requests matching that
|
||||
schema will be separated into flows. It may be
|
||||
either `ByUser`, in which case one requesting user will not be able to starve
|
||||
other users of capacity, or `ByNamespace`, in which case requests for resources
|
||||
in one namespace will not be able to starve requests for resources in other
|
||||
namespaces of capacity, or it may be blank (or `distinguisherMethod` may be
|
||||
omitted entirely), in which case all requests matched by this FlowSchema will be
|
||||
considered part of a single flow. The correct choice for a given FlowSchema
|
||||
depends on the resource and your particular environment.
|
||||
|
||||
## Diagnostics
|
||||
Every HTTP response from an API server with the priority and fairness feature
|
||||
enabled has two extra headers: `X-Kubernetes-PF-FlowSchema-UID` and
|
||||
`X-Kubernetes-PF-PriorityLevel-UID`, noting the flow schema that matched the request
|
||||
and the priority level to which it was assigned, respectively. The API objects'
|
||||
names are not included in these headers in case the requesting user does not
|
||||
have permission to view them, so when debugging you can use a command like
|
||||
|
||||
```shell
|
||||
kubectl get flowschemas -o custom-columns="uid:{metadata.uid},name:{metadata.name}"
|
||||
kubectl get prioritylevelconfigurations -o custom-columns="uid:{metadata.uid},name:{metadata.name}"
|
||||
```
|
||||
|
||||
to get a mapping of UIDs to names for both FlowSchemas and
|
||||
PriorityLevelConfigurations.
|
||||
|
||||
## Observability
|
||||
When you enable the API Priority and Fairness feature, the kube-apiserver
|
||||
exports additional metrics. Monitoring these can help you determine whether your
|
||||
configuration is inappropriately throttling important traffic, or find
|
||||
poorly-behaved workloads that may be harming system health.
|
||||
|
||||
* `apiserver_flowcontrol_rejected_requests_total` counts requests that
|
||||
were rejected, grouped by the name of the assigned priority level,
|
||||
the name of the assigned FlowSchema, and the reason for rejection.
|
||||
The reason will be one of the following:
|
||||
* `queue-full`, indicating that too many requests were already
|
||||
queued,
|
||||
* `concurrency-limit`, indicating that the
|
||||
PriorityLevelConfiguration is configured to reject rather than
|
||||
queue excess requests, or
|
||||
* `time-out`, indicating that the request was still in the queue
|
||||
when its queuing time limit expired.
|
||||
|
||||
* `apiserver_flowcontrol_dispatched_requests_total` counts requests
|
||||
that began executing, grouped by the name of the assigned priority
|
||||
level and the name of the assigned FlowSchema.
|
||||
|
||||
* `apiserver_flowcontrol_current_inqueue_requests` gives the
|
||||
instantaneous total number of queued (not executing) requests,
|
||||
grouped by priority level and FlowSchema.
|
||||
|
||||
* `apiserver_flowcontrol_current_executing_requests` gives the instantaneous
|
||||
total number of executing requests, grouped by priority level and FlowSchema.
|
||||
|
||||
* `apiserver_flowcontrol_request_queue_length_after_enqueue` gives a
|
||||
histogram of queue lengths for the queues, grouped by priority level
|
||||
and FlowSchema, as sampled by the enqueued requests. Each request
|
||||
that gets queued contributes one sample to its histogram, reporting
|
||||
the length of the queue just after the request was added. Note that
|
||||
this produces different statistics than an unbiased survey would.
|
||||
{{< note >}}
|
||||
An outlier value in a histogram here means it is likely that a single flow
|
||||
(i.e., requests by one user or for one namespace, depending on
|
||||
configuration) is flooding the API server, and being throttled. By contrast,
|
||||
if one priority level's histogram shows that all queues for that priority
|
||||
level are longer than those for other priority levels, it may be appropriate
|
||||
to increase that PriorityLevelConfiguration's concurrency shares.
|
||||
{{< /note >}}
|
||||
|
||||
* `apiserver_flowcontrol_request_concurrency_limit` gives the computed
|
||||
concurrency limit (based on the API server's total concurrency limit and PriorityLevelConfigurations'
|
||||
concurrency shares) for each PriorityLevelConfiguration.
|
||||
|
||||
* `apiserver_flowcontrol_request_wait_duration_seconds` gives a histogram of how
|
||||
long requests spent queued, grouped by the FlowSchema that matched the
|
||||
request, the PriorityLevel to which it was assigned, and whether or not the
|
||||
request successfully executed.
|
||||
{{< note >}}
|
||||
Since each FlowSchema always assigns requests to a single
|
||||
PriorityLevelConfiguration, you can add the histograms for all the
|
||||
FlowSchemas for one priority level to get the effective histogram for
|
||||
requests assigned to that priority level.
|
||||
{{< /note >}}
|
||||
|
||||
* `apiserver_flowcontrol_request_execution_seconds` gives a histogram of how
|
||||
long requests took to actually execute, grouped by the FlowSchema that matched the
|
||||
request and the PriorityLevel to which it was assigned.
|
||||
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture whatsnext %}}
|
||||
|
||||
For background information on design details for API priority and fairness, see
|
||||
the [enhancement proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md).
|
||||
You can make suggestions and feature requests via [SIG API
|
||||
Machinery](https://github.com/kubernetes/community/tree/master/sig-api-machinery).
|
||||
|
||||
{{% /capture %}}
|
|
@ -76,7 +76,7 @@ should set up a solution to address that.
|
|||
For example, in Kubernetes clusters, deployed by the `kube-up.sh` script,
|
||||
there is a [`logrotate`](https://linux.die.net/man/8/logrotate)
|
||||
tool configured to run each hour. You can also set up a container runtime to
|
||||
rotate application's logs automatically, e.g. by using Docker's `log-opt`.
|
||||
rotate application's logs automatically, for example by using Docker's `log-opt`.
|
||||
In the `kube-up.sh` script, the latter approach is used for COS image on GCP,
|
||||
and the former approach is used in any other environment. In both cases, by
|
||||
default rotation is configured to take place when log file exceeds 10MB.
|
||||
|
|
|
@ -424,16 +424,16 @@ At some point, you'll eventually need to update your deployed application, typic
|
|||
|
||||
We'll guide you through how to create and update applications with Deployments.
|
||||
|
||||
Let's say you were running version 1.7.9 of nginx:
|
||||
Let's say you were running version 1.14.2 of nginx:
|
||||
|
||||
```shell
|
||||
kubectl run my-nginx --image=nginx:1.7.9 --replicas=3
|
||||
kubectl run my-nginx --image=nginx:1.14.2 --replicas=3
|
||||
```
|
||||
```shell
|
||||
deployment.apps/my-nginx created
|
||||
```
|
||||
|
||||
To update to version 1.9.1, simply change `.spec.template.spec.containers[0].image` from `nginx:1.7.9` to `nginx:1.9.1`, with the kubectl commands we learned above.
|
||||
To update to version 1.16.1, simply change `.spec.template.spec.containers[0].image` from `nginx:1.14.2` to `nginx:1.16.1`, with the kubectl commands we learned above.
|
||||
|
||||
```shell
|
||||
kubectl edit deployment/my-nginx
|
||||
|
|
|
@ -0,0 +1,132 @@
|
|||
---
|
||||
title: Metrics For The Kubernetes Control Plane
|
||||
reviewers:
|
||||
- brancz
|
||||
- logicalhan
|
||||
- RainbowMango
|
||||
content_template: templates/concept
|
||||
weight: 60
|
||||
aliases:
|
||||
- controller-metrics.md
|
||||
---
|
||||
|
||||
{{% capture overview %}}
|
||||
|
||||
System component metrics can give a better look into what is happening inside them. Metrics are particularly useful for building dashboards and alerts.
|
||||
|
||||
Metrics in Kubernetes control plane are emitted in [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and are human readable.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture body %}}
|
||||
|
||||
## Metrics in Kubernetes
|
||||
|
||||
In most cases metrics are available on `/metrics` endpoint of the HTTP server. For components that doesn't expose endpoint by default it can be enabled using `--bind-address` flag.
|
||||
|
||||
Examples of those components:
|
||||
* {{< glossary_tooltip term_id="kube-controller-manager" text="kube-controller-manager" >}}
|
||||
* {{< glossary_tooltip term_id="kube-proxy" text="kube-proxy" >}}
|
||||
* {{< glossary_tooltip term_id="kube-apiserver" text="kube-apiserver" >}}
|
||||
* {{< glossary_tooltip term_id="kube-scheduler" text="kube-scheduler" >}}
|
||||
* {{< glossary_tooltip term_id="kubelet" text="kubelet" >}}
|
||||
|
||||
In a production environment you may want to configure [Prometheus Server](https://prometheus.io/) or some other metrics scraper
|
||||
to periodically gather these metrics and make them available in some kind of time series database.
|
||||
|
||||
Note that {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} also exposes metrics in `/metrics/cadvisor`, `/metrics/resource` and `/metrics/probes` endpoints. Those metrics do not have same lifecycle.
|
||||
|
||||
If your cluster uses {{< glossary_tooltip term_id="rbac" text="RBAC" >}}, reading metrics requires authorization via a user, group or ServiceAccount with a ClusterRole that allows accessing `/metrics`.
|
||||
For example:
|
||||
```
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: prometheus
|
||||
rules:
|
||||
- nonResourceURLs:
|
||||
- "/metrics"
|
||||
verbs:
|
||||
- get
|
||||
```
|
||||
|
||||
## Metric lifecycle
|
||||
|
||||
Alpha metric → Stable metric → Deprecated metric → Hidden metric → Deletion
|
||||
|
||||
Alpha metrics have no stability guarantees; as such they can be modified or deleted at any time.
|
||||
|
||||
Stable metrics can be guaranteed to not change; Specifically, stability means:
|
||||
|
||||
* the metric itself will not be deleted (or renamed)
|
||||
* the type of metric will not be modified
|
||||
|
||||
Deprecated metric signal that the metric will eventually be deleted; to find which version, you need to check annotation, which includes from which kubernetes version that metric will be considered deprecated.
|
||||
|
||||
Before deprecation:
|
||||
|
||||
```
|
||||
# HELP some_counter this counts things
|
||||
# TYPE some_counter counter
|
||||
some_counter 0
|
||||
```
|
||||
|
||||
After deprecation:
|
||||
|
||||
```
|
||||
# HELP some_counter (Deprecated since 1.15.0) this counts things
|
||||
# TYPE some_counter counter
|
||||
some_counter 0
|
||||
```
|
||||
|
||||
Once a metric is hidden then by default the metrics is not published for scraping. To use a hidden metric, you need to override the configuration for the relevant cluster component.
|
||||
|
||||
Once a metric is deleted, the metric is not published. You cannot change this using an override.
|
||||
|
||||
|
||||
## Show Hidden Metrics
|
||||
|
||||
As described above, admins can enable hidden metrics through a command-line flag on a specific binary. This intends to be used as an escape hatch for admins if they missed the migration of the metrics deprecated in the last release.
|
||||
|
||||
The flag `show-hidden-metrics-for-version` takes a version for which you want to show metrics deprecated in that release. The version is expressed as x.y, where x is the major version, y is the minor version. The patch version is not needed even though a metrics can be deprecated in a patch release, the reason for that is the metrics deprecation policy runs against the minor release.
|
||||
|
||||
The flag can only take the previous minor version as it's value. All metrics hidden in previous will be emitted if admins set the previous version to `show-hidden-metrics-for-version`. The too old version is not allowed because this violates the metrics deprecated policy.
|
||||
|
||||
Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. According to metrics deprecated policy, we can reach the following conclusion:
|
||||
|
||||
* In release `1.n`, the metric is deprecated, and it can be emitted by default.
|
||||
* In release `1.n+1`, the metric is hidden by default and it can be emitted by command line `show-hidden-metrics-for-version=1.n`.
|
||||
* In release `1.n+2`, the metric should be removed from the codebase. No escape hatch anymore.
|
||||
|
||||
If you're upgrading from release `1.12` to `1.13`, but still depend on a metric `A` deprecated in `1.12`, you should set hidden metrics via command line: `--show-hidden-metrics=1.12` and remember to remove this metric dependency before upgrading to `1.14`
|
||||
|
||||
## Component metrics
|
||||
|
||||
### kube-controller-manager metrics
|
||||
|
||||
Controller manager metrics provide important insight into the performance and health of the controller manager.
|
||||
These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as
|
||||
etcd request latencies or Cloudprovider (AWS, GCE, OpenStack) API latencies that can be used
|
||||
to gauge the health of a cluster.
|
||||
|
||||
Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and OpenStack.
|
||||
These metrics can be used to monitor health of persistent volume operations.
|
||||
|
||||
For example, for GCE these metrics are called:
|
||||
|
||||
```
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
|
||||
```
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture whatsnext %}}
|
||||
* Read about the [Prometheus text format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format) for metrics
|
||||
* See the list of [stable Kubernetes metrics](https://github.com/kubernetes/kubernetes/blob/master/test/instrumentation/testdata/stable-metrics-list.yaml)
|
||||
* Read about the [Kubernetes deprecation policy](https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-a-feature-or-behavior )
|
||||
{{% /capture %}}
|
|
@ -17,7 +17,7 @@ There are several ways to do this, and the recommended approaches all use
|
|||
[label selectors](/docs/concepts/overview/working-with-objects/labels/) to make the selection.
|
||||
Generally such constraints are unnecessary, as the scheduler will automatically do a reasonable placement
|
||||
(e.g. spread your pods across nodes, not place the pod on a node with insufficient free resources, etc.)
|
||||
but there are some circumstances where you may want more control on a node where a pod lands, e.g. to ensure
|
||||
but there are some circumstances where you may want more control on a node where a pod lands, for example to ensure
|
||||
that a pod ends up on a machine with an SSD attached to it, or to co-locate pods from two different
|
||||
services that communicate a lot into the same availability zone.
|
||||
|
||||
|
@ -111,9 +111,10 @@ For example, `example.com.node-restriction.kubernetes.io/fips=true` or `example.
|
|||
`nodeSelector` provides a very simple way to constrain pods to nodes with particular labels. The affinity/anti-affinity
|
||||
feature, greatly expands the types of constraints you can express. The key enhancements are
|
||||
|
||||
1. the language is more expressive (not just "AND or exact match")
|
||||
1. The affinity/anti-affinity language is more expressive. The language offers more matching rules
|
||||
besides exact matches created with a logical AND operation;
|
||||
2. you can indicate that the rule is "soft"/"preference" rather than a hard requirement, so if the scheduler
|
||||
can't satisfy it, the pod will still be scheduled
|
||||
can't satisfy it, the pod will still be scheduled;
|
||||
3. you can constrain against labels on other pods running on the node (or other topological domain),
|
||||
rather than against labels on the node itself, which allows rules about which pods can and cannot be co-located
|
||||
|
||||
|
@ -159,9 +160,9 @@ You can use `NotIn` and `DoesNotExist` to achieve node anti-affinity behavior, o
|
|||
If you specify both `nodeSelector` and `nodeAffinity`, *both* must be satisfied for the pod
|
||||
to be scheduled onto a candidate node.
|
||||
|
||||
If you specify multiple `nodeSelectorTerms` associated with `nodeAffinity` types, then the pod can be scheduled onto a node **if one of** the `nodeSelectorTerms` is satisfied.
|
||||
If you specify multiple `nodeSelectorTerms` associated with `nodeAffinity` types, then the pod can be scheduled onto a node **only if all** `nodeSelectorTerms` can be satisfied.
|
||||
|
||||
If you specify multiple `matchExpressions` associated with `nodeSelectorTerms`, then the pod can be scheduled onto a node **only if all** `matchExpressions` can be satisfied.
|
||||
If you specify multiple `matchExpressions` associated with `nodeSelectorTerms`, then the pod can be scheduled onto a node **if one of** the `matchExpressions` is satisfied.
|
||||
|
||||
If you remove or change the label of the node where the pod is scheduled, the pod won't be removed. In other words, the affinity selection works only at the time of scheduling the pod.
|
||||
|
||||
|
@ -176,7 +177,7 @@ Y is expressed as a LabelSelector with an optional associated list of namespaces
|
|||
(and therefore the labels on pods are implicitly namespaced),
|
||||
a label selector over pod labels must specify which namespaces the selector should apply to. Conceptually X is a topology domain
|
||||
like node, rack, cloud provider zone, cloud provider region, etc. You express it using a `topologyKey` which is the
|
||||
key for the node label that the system uses to denote such a topology domain, e.g. see the label keys listed above
|
||||
key for the node label that the system uses to denote such a topology domain; for example, see the label keys listed above
|
||||
in the section [Interlude: built-in node labels](#built-in-node-labels).
|
||||
|
||||
{{< note >}}
|
||||
|
@ -186,7 +187,7 @@ not recommend using them in clusters larger than several hundred nodes.
|
|||
{{< /note >}}
|
||||
|
||||
{{< note >}}
|
||||
Pod anti-affinity requires nodes to be consistently labelled, i.e. every node in the cluster must have an appropriate label matching `topologyKey`. If some or all nodes are missing the specified `topologyKey` label, it can lead to unintended behavior.
|
||||
Pod anti-affinity requires nodes to be consistently labelled, in other words every node in the cluster must have an appropriate label matching `topologyKey`. If some or all nodes are missing the specified `topologyKey` label, it can lead to unintended behavior.
|
||||
{{< /note >}}
|
||||
|
||||
As with node affinity, there are currently two types of pod affinity and anti-affinity, called `requiredDuringSchedulingIgnoredDuringExecution` and
|
||||
|
@ -228,7 +229,7 @@ for performance and security reasons, there are some constraints on topologyKey:
|
|||
1. For affinity and for `requiredDuringSchedulingIgnoredDuringExecution` pod anti-affinity,
|
||||
empty `topologyKey` is not allowed.
|
||||
2. For `requiredDuringSchedulingIgnoredDuringExecution` pod anti-affinity, the admission controller `LimitPodHardAntiAffinityTopology` was introduced to limit `topologyKey` to `kubernetes.io/hostname`. If you want to make it available for custom topologies, you may modify the admission controller, or simply disable it.
|
||||
3. For `preferredDuringSchedulingIgnoredDuringExecution` pod anti-affinity, empty `topologyKey` is interpreted as "all topologies" ("all topologies" here is now limited to the combination of `kubernetes.io/hostname`, `failure-domain.beta.kubernetes.io/zone` and `failure-domain.beta.kubernetes.io/region`).
|
||||
3. For `preferredDuringSchedulingIgnoredDuringExecution` pod anti-affinity, empty `topologyKey` is not allowed.
|
||||
4. Except for the above cases, the `topologyKey` can be any legal label-key.
|
||||
|
||||
In addition to `labelSelector` and `topologyKey`, you can optionally specify a list `namespaces`
|
||||
|
@ -318,7 +319,7 @@ spec:
|
|||
topologyKey: "kubernetes.io/hostname"
|
||||
containers:
|
||||
- name: web-app
|
||||
image: nginx:1.12-alpine
|
||||
image: nginx:1.16-alpine
|
||||
```
|
||||
|
||||
If we create the above two deployments, our three node cluster should look like below.
|
||||
|
@ -366,7 +367,7 @@ Some of the limitations of using `nodeName` to select nodes are:
|
|||
some cases may be automatically deleted.
|
||||
- If the named node does not have the resources to accommodate the
|
||||
pod, the pod will fail and its reason will indicate why,
|
||||
e.g. OutOfmemory or OutOfcpu.
|
||||
for example OutOfmemory or OutOfcpu.
|
||||
- Node names in cloud environments are not always predictable or
|
||||
stable.
|
||||
|
||||
|
|
|
@ -68,13 +68,7 @@ resource requests/limits of that type for each Container in the Pod.
|
|||
## Meaning of CPU
|
||||
|
||||
Limits and requests for CPU resources are measured in *cpu* units.
|
||||
One cpu, in Kubernetes, is equivalent to:
|
||||
|
||||
- 1 AWS vCPU
|
||||
- 1 GCP Core
|
||||
- 1 Azure vCore
|
||||
- 1 IBM vCPU
|
||||
- 1 *Hyperthread* on a bare-metal Intel processor with Hyperthreading
|
||||
One cpu, in Kubernetes, is equivalent to **1 vCPU/Core** for cloud providers and **1 hyperthread** on bare-metal Intel processors.
|
||||
|
||||
Fractional requests are allowed. A Container with
|
||||
`spec.containers[].resources.requests.cpu` of `0.5` is guaranteed half as much
|
||||
|
@ -191,9 +185,10 @@ resource limits, see the
|
|||
|
||||
The resource usage of a Pod is reported as part of the Pod status.
|
||||
|
||||
If [optional monitoring](http://releases.k8s.io/{{< param "githubbranch" >}}/cluster/addons/cluster-monitoring/README.md)
|
||||
is configured for your cluster, then Pod resource usage can be retrieved from
|
||||
the monitoring system.
|
||||
If optional [tools for monitoring](/docs/tasks/debug-application-cluster/resource-usage-monitoring/)
|
||||
are available in your cluster, then Pod resource usage can be retrieved either
|
||||
from the [Metrics API](/docs/tasks/debug-application-cluster/resource-metrics-pipeline/#the-metrics-api)
|
||||
directly or from your monitoring tools.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
@ -391,7 +386,7 @@ spec:
|
|||
### How Pods with ephemeral-storage requests are scheduled
|
||||
|
||||
When you create a Pod, the Kubernetes scheduler selects a node for the Pod to
|
||||
run on. Each node has a maximum amount of local ephemeral storage it can provide for Pods. For more information, see ["Node Allocatable"](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable).
|
||||
run on. Each node has a maximum amount of local ephemeral storage it can provide for Pods. For more information, see ["Node Allocatable"](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable).
|
||||
|
||||
The scheduler ensures that the sum of the resource requests of the scheduled Containers is less than the capacity of the node.
|
||||
|
||||
|
|
|
@ -30,7 +30,7 @@ This is a living document. If you think of something that is not on this list bu
|
|||
- Put object descriptions in annotations, to allow better introspection.
|
||||
|
||||
|
||||
## "Naked" Pods vs ReplicaSets, Deployments, and Jobs
|
||||
## "Naked" Pods versus ReplicaSets, Deployments, and Jobs {#naked-pods-vs-replicasets-deployments-and-jobs}
|
||||
|
||||
- Don't use naked Pods (that is, Pods not bound to a [ReplicaSet](/docs/concepts/workloads/controllers/replicaset/) or [Deployment](/docs/concepts/workloads/controllers/deployment/)) if you can avoid it. Naked Pods will not be rescheduled in the event of a node failure.
|
||||
|
||||
|
@ -87,7 +87,7 @@ The [imagePullPolicy](/docs/concepts/containers/images/#updating-images) and the
|
|||
- `imagePullPolicy: Never`: the image is assumed to exist locally. No attempt is made to pull the image.
|
||||
|
||||
{{< note >}}
|
||||
To make sure the container always uses the same version of the image, you can specify its [digest](https://docs.docker.com/engine/reference/commandline/pull/#pull-an-image-by-digest-immutable-identifier), for example `sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2`. The digest uniquely identifies a specific version of the image, so it is never updated by Kubernetes unless you change the digest value.
|
||||
To make sure the container always uses the same version of the image, you can specify its [digest](https://docs.docker.com/engine/reference/commandline/pull/#pull-an-image-by-digest-immutable-identifier); replace `<image-name>:<tag>` with `<image-name>@<digest>` (for example, `image@sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2`). The digest uniquely identifies a specific version of the image, so it is never updated by Kubernetes unless you change the digest value.
|
||||
{{< /note >}}
|
||||
|
||||
{{< note >}}
|
||||
|
@ -108,4 +108,3 @@ The caching semantics of the underlying image provider make even `imagePullPolic
|
|||
|
||||
{{% /capture %}}
|
||||
|
||||
|
||||
|
|
|
@ -10,12 +10,12 @@ weight: 20
|
|||
|
||||
{{% capture overview %}}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.16" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.18" state="beta" >}}
|
||||
|
||||
|
||||
When you run a Pod on a Node, the Pod itself takes an amount of system resources. These
|
||||
resources are additional to the resources needed to run the container(s) inside the Pod.
|
||||
_Pod Overhead_ is a feature for accounting for the resources consumed by the pod infrastructure
|
||||
_Pod Overhead_ is a feature for accounting for the resources consumed by the Pod infrastructure
|
||||
on top of the container requests & limits.
|
||||
|
||||
|
||||
|
@ -24,33 +24,169 @@ on top of the container requests & limits.
|
|||
|
||||
{{% capture body %}}
|
||||
|
||||
## Pod Overhead
|
||||
|
||||
In Kubernetes, the pod's overhead is set at
|
||||
In Kubernetes, the Pod's overhead is set at
|
||||
[admission](/docs/reference/access-authn-authz/extensible-admission-controllers/#what-are-admission-webhooks)
|
||||
time according to the overhead associated with the pod's
|
||||
time according to the overhead associated with the Pod's
|
||||
[RuntimeClass](/docs/concepts/containers/runtime-class/).
|
||||
|
||||
When Pod Overhead is enabled, the overhead is considered in addition to the sum of container
|
||||
resource requests when scheduling a pod. Similarly, Kubelet will include the pod overhead when sizing
|
||||
the pod cgroup, and when carrying out pod eviction ranking.
|
||||
resource requests when scheduling a Pod. Similarly, Kubelet will include the Pod overhead when sizing
|
||||
the Pod cgroup, and when carrying out Pod eviction ranking.
|
||||
|
||||
### Set Up
|
||||
## Enabling Pod Overhead {#set-up}
|
||||
|
||||
You need to make sure that the `PodOverhead`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled (it is off by default)
|
||||
across your cluster. This means:
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled (it is on by default as of 1.18)
|
||||
across your cluster, and a `RuntimeClass` is utilized which defines the `overhead` field.
|
||||
|
||||
- in {{< glossary_tooltip text="kube-scheduler" term_id="kube-scheduler" >}}
|
||||
- in {{< glossary_tooltip text="kube-apiserver" term_id="kube-apiserver" >}}
|
||||
- in the {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} on each Node
|
||||
- in any custom API servers that use feature gates
|
||||
## Usage example
|
||||
|
||||
{{< note >}}
|
||||
Users who can write to RuntimeClass resources are able to have cluster-wide impact on
|
||||
workload performance. You can limit access to this ability using Kubernetes access controls.
|
||||
See [Authorization Overview](/docs/reference/access-authn-authz/authorization/) for more details.
|
||||
{{< /note >}}
|
||||
To use the PodOverhead feature, you need a RuntimeClass that defines the `overhead` field. As
|
||||
an example, you could use the following RuntimeClass definition with a virtualizing container runtime
|
||||
that uses around 120MiB per Pod for the virtual machine and the guest OS:
|
||||
|
||||
```yaml
|
||||
---
|
||||
kind: RuntimeClass
|
||||
apiVersion: node.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: kata-fc
|
||||
handler: kata-fc
|
||||
overhead:
|
||||
podFixed:
|
||||
memory: "120Mi"
|
||||
cpu: "250m"
|
||||
```
|
||||
|
||||
Workloads which are created which specify the `kata-fc` RuntimeClass handler will take the memory and
|
||||
cpu overheads into account for resource quota calculations, node scheduling, as well as Pod cgroup sizing.
|
||||
|
||||
Consider running the given example workload, test-pod:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: test-pod
|
||||
spec:
|
||||
runtimeClassName: kata-fc
|
||||
containers:
|
||||
- name: busybox-ctr
|
||||
image: busybox
|
||||
stdin: true
|
||||
tty: true
|
||||
resources:
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 100Mi
|
||||
- name: nginx-ctr
|
||||
image: nginx
|
||||
resources:
|
||||
limits:
|
||||
cpu: 1500m
|
||||
memory: 100Mi
|
||||
```
|
||||
|
||||
At admission time the RuntimeClass [admission controller](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/)
|
||||
updates the workload's PodSpec to include the `overhead` as described in the RuntimeClass. If the PodSpec already has this field defined,
|
||||
the Pod will be rejected. In the given example, since only the RuntimeClass name is specified, the admission controller mutates the Pod
|
||||
to include an `overhead`.
|
||||
|
||||
After the RuntimeClass admission controller, you can check the updated PodSpec:
|
||||
|
||||
```bash
|
||||
kubectl get pod test-pod -o jsonpath='{.spec.overhead}'
|
||||
```
|
||||
|
||||
The output is:
|
||||
```
|
||||
map[cpu:250m memory:120Mi]
|
||||
```
|
||||
|
||||
If a ResourceQuota is defined, the sum of container requests as well as the
|
||||
`overhead` field are counted.
|
||||
|
||||
When the kube-scheduler is deciding which node should run a new Pod, the scheduler considers that Pod's
|
||||
`overhead` as well as the sum of container requests for that Pod. For this example, the scheduler adds the
|
||||
requests and the overhead, then looks for a node that has 2.25 CPU and 320 MiB of memory available.
|
||||
|
||||
Once a Pod is scheduled to a node, the kubelet on that node creates a new {{< glossary_tooltip text="cgroup" term_id="cgroup" >}}
|
||||
for the Pod. It is within this pod that the underlying container runtime will create containers.
|
||||
|
||||
If the resource has a limit defined for each container (Guaranteed QoS or Bustrable QoS with limits defined),
|
||||
the kubelet will set an upper limit for the pod cgroup associated with that resource (cpu.cfs_quota_us for CPU
|
||||
and memory.limit_in_bytes memory). This upper limit is based on the sum of the container limits plus the `overhead`
|
||||
defined in the PodSpec.
|
||||
|
||||
For CPU, if the Pod is Guaranteed or Burstable QoS, the kubelet will set `cpu.shares` based on the sum of container
|
||||
requests plus the `overhead` defined in the PodSpec.
|
||||
|
||||
Looking at our example, verify the container requests for the workload:
|
||||
```bash
|
||||
kubectl get pod test-pod -o jsonpath='{.spec.containers[*].resources.limits}'
|
||||
```
|
||||
|
||||
The total container requests are 2000m CPU and 200MiB of memory:
|
||||
```
|
||||
map[cpu: 500m memory:100Mi] map[cpu:1500m memory:100Mi]
|
||||
```
|
||||
|
||||
Check this against what is observed by the node:
|
||||
```bash
|
||||
kubectl describe node | grep test-pod -B2
|
||||
```
|
||||
|
||||
The output shows 2250m CPU and 320MiB of memory are requested, which includes PodOverhead:
|
||||
```
|
||||
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
|
||||
--------- ---- ------------ ---------- --------------- ------------- ---
|
||||
default test-pod 2250m (56%) 2250m (56%) 320Mi (1%) 320Mi (1%) 36m
|
||||
```
|
||||
|
||||
## Verify Pod cgroup limits
|
||||
|
||||
Check the Pod's memory cgroups on the node where the workload is running. In the following example, [`crictl`](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md)
|
||||
is used on the node, which provides a CLI for CRI-compatible container runtimes. This is an
|
||||
advanced example to show PodOverhead behavior, and it is not expected that users should need to check
|
||||
cgroups directly on the node.
|
||||
|
||||
First, on the particular node, determine the Pod identifier:
|
||||
|
||||
```bash
|
||||
# Run this on the node where the Pod is scheduled
|
||||
POD_ID="$(sudo crictl pods --name test-pod -q)"
|
||||
```
|
||||
|
||||
From this, you can determine the cgroup path for the Pod:
|
||||
```bash
|
||||
# Run this on the node where the Pod is scheduled
|
||||
sudo crictl inspectp -o=json $POD_ID | grep cgroupsPath
|
||||
```
|
||||
|
||||
The resulting cgroup path includes the Pod's `pause` container. The Pod level cgroup is one directory above.
|
||||
```
|
||||
"cgroupsPath": "/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/7ccf55aee35dd16aca4189c952d83487297f3cd760f1bbf09620e206e7d0c27a"
|
||||
```
|
||||
|
||||
In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2`. Verify the Pod level cgroup setting for memory:
|
||||
```bash
|
||||
# Run this on the node where the Pod is scheduled.
|
||||
# Also, change the name of the cgroup to match the cgroup allocated for your pod.
|
||||
cat /sys/fs/cgroup/memory/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/memory.limit_in_bytes
|
||||
```
|
||||
|
||||
This is 320 MiB, as expected:
|
||||
```
|
||||
335544320
|
||||
```
|
||||
|
||||
### Observability
|
||||
|
||||
A `kube_pod_overhead` metric is available in [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
|
||||
to help identify when PodOverhead is being utilized and to help observe stability of workloads
|
||||
running with a defined Overhead. This functionality is not available in the 1.9 release of
|
||||
kube-state-metrics, but is expected in a following release. Users will need to build kube-state-metrics
|
||||
from source in the meantime.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
|
|
|
@ -16,42 +16,25 @@ importance of a Pod relative to other Pods. If a Pod cannot be scheduled, the
|
|||
scheduler tries to preempt (evict) lower priority Pods to make scheduling of the
|
||||
pending Pod possible.
|
||||
|
||||
In Kubernetes 1.9 and later, Priority also affects scheduling order of Pods and
|
||||
out-of-resource eviction ordering on the Node.
|
||||
|
||||
Pod priority and preemption graduated to beta in Kubernetes 1.11 and to GA in
|
||||
Kubernetes 1.14. They have been enabled by default since 1.11.
|
||||
|
||||
In Kubernetes versions where Pod priority and preemption is still an alpha-level
|
||||
feature, you need to explicitly enable it. To use these features in the older
|
||||
versions of Kubernetes, follow the instructions in the documentation for your
|
||||
Kubernetes version, by going to the documentation archive version for your
|
||||
Kubernetes version.
|
||||
|
||||
Kubernetes Version | Priority and Preemption State | Enabled by default
|
||||
------------------ | :---------------------------: | :----------------:
|
||||
1.8 | alpha | no
|
||||
1.9 | alpha | no
|
||||
1.10 | alpha | no
|
||||
1.11 | beta | yes
|
||||
1.14 | stable | yes
|
||||
|
||||
{{< warning >}}In a cluster where not all users are trusted, a
|
||||
malicious user could create pods at the highest possible priorities, causing
|
||||
other pods to be evicted/not get scheduled. To resolve this issue,
|
||||
[ResourceQuota](/docs/concepts/policy/resource-quotas/) is
|
||||
augmented to support Pod priority. An admin can create ResourceQuota for users
|
||||
at specific priority levels, preventing them from creating pods at high
|
||||
priorities. This feature is in beta since Kubernetes 1.12.
|
||||
{{< /warning >}}
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture body %}}
|
||||
|
||||
|
||||
{{< warning >}}
|
||||
In a cluster where not all users are trusted, a malicious user could create Pods
|
||||
at the highest possible priorities, causing other Pods to be evicted/not get
|
||||
scheduled.
|
||||
An administrator can use ResourceQuota to prevent users from creating pods at
|
||||
high priorities.
|
||||
|
||||
See [limit Priority Class consumption by default](/docs/concepts/policy/resource-quotas/#limit-priority-class-consumption-by-default)
|
||||
for details.
|
||||
{{< /warning >}}
|
||||
|
||||
## How to use priority and preemption
|
||||
|
||||
To use priority and preemption in Kubernetes 1.11 and later, follow these steps:
|
||||
To use priority and preemption:
|
||||
|
||||
1. Add one or more [PriorityClasses](#priorityclass).
|
||||
|
||||
|
@ -62,6 +45,12 @@ To use priority and preemption in Kubernetes 1.11 and later, follow these steps:
|
|||
|
||||
Keep reading for more information about these steps.
|
||||
|
||||
{{< note >}}
|
||||
Kubernetes already ships with two PriorityClasses:
|
||||
`system-cluster-critical` and `system-node-critical`.
|
||||
These are common classes and are used to [ensure that critical components are always scheduled first](/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/).
|
||||
{{< /note >}}
|
||||
|
||||
If you try the feature and then decide to disable it, you must remove the
|
||||
PodPriority command-line flag or set it to `false`, and then restart the API
|
||||
server and scheduler. After the feature is disabled, the existing Pods keep
|
||||
|
@ -71,21 +60,20 @@ Pods.
|
|||
|
||||
## How to disable preemption
|
||||
|
||||
{{< note >}}
|
||||
In Kubernetes 1.12+, critical pods rely on scheduler preemption to be scheduled
|
||||
when a cluster is under resource pressure. For this reason, it is not
|
||||
recommended to disable preemption.
|
||||
{{< /note >}}
|
||||
{{< caution >}}
|
||||
Critical pods rely on scheduler preemption to be scheduled when a cluster
|
||||
is under resource pressure. For this reason, it is not recommended to
|
||||
disable preemption.
|
||||
{{< /caution >}}
|
||||
|
||||
{{< note >}}
|
||||
In Kubernetes 1.15 and later,
|
||||
if the feature `NonPreemptingPriority` is enabled,
|
||||
In Kubernetes 1.15 and later, if the feature `NonPreemptingPriority` is enabled,
|
||||
PriorityClasses have the option to set `preemptionPolicy: Never`.
|
||||
This will prevent pods of that PriorityClass from preempting other pods.
|
||||
{{< /note >}}
|
||||
|
||||
In Kubernetes 1.11 and later, preemption is controlled by a kube-scheduler flag
|
||||
`disablePreemption`, which is set to `false` by default.
|
||||
Preemption is controlled by a kube-scheduler flag `disablePreemption`, which is
|
||||
set to `false` by default.
|
||||
If you want to disable preemption despite the above note, you can set
|
||||
`disablePreemption` to `true`.
|
||||
|
||||
|
@ -111,6 +99,9 @@ priority class name to the integer value of the priority. The name is specified
|
|||
in the `name` field of the PriorityClass object's metadata. The value is
|
||||
specified in the required `value` field. The higher the value, the higher the
|
||||
priority.
|
||||
The name of a PriorityClass object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names),
|
||||
and it cannot be prefixed with `system-`.
|
||||
|
||||
A PriorityClass object can have any 32-bit integer value smaller than or equal
|
||||
to 1 billion. Larger numbers are reserved for critical system Pods that should
|
||||
|
@ -152,12 +143,9 @@ globalDefault: false
|
|||
description: "This priority class should be used for XYZ service pods only."
|
||||
```
|
||||
|
||||
### Non-preempting PriorityClasses (alpha) {#non-preempting-priority-class}
|
||||
## Non-preempting PriorityClass {#non-preempting-priority-class}
|
||||
|
||||
1.15 adds the `PreemptionPolicy` field as an alpha feature.
|
||||
It is disabled by default in 1.15,
|
||||
and requires the `NonPreemptingPriority`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/
|
||||
) to be enabled.
|
||||
{{< feature-state for_k8s_version="1.15" state="alpha" >}}
|
||||
|
||||
Pods with `PreemptionPolicy: Never` will be placed in the scheduling queue
|
||||
ahead of lower-priority pods,
|
||||
|
@ -181,6 +169,10 @@ which will allow pods of that PriorityClass to preempt lower-priority pods
|
|||
If `PreemptionPolicy` is set to `Never`,
|
||||
pods in that PriorityClass will be non-preempting.
|
||||
|
||||
The use of the `PreemptionPolicy` field requires the `NonPreemptingPriority`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
to be enabled.
|
||||
|
||||
An example use case is for data science workloads.
|
||||
A user may submit a job that they want to be prioritized above other workloads,
|
||||
but do not wish to discard existing work by preempting running pods.
|
||||
|
@ -188,7 +180,7 @@ The high priority job with `PreemptionPolicy: Never` will be scheduled
|
|||
ahead of other queued pods,
|
||||
as soon as sufficient cluster resources "naturally" become free.
|
||||
|
||||
#### Example Non-preempting PriorityClass
|
||||
### Example Non-preempting PriorityClass
|
||||
|
||||
```yaml
|
||||
apiVersion: scheduling.k8s.io/v1
|
||||
|
@ -230,12 +222,12 @@ spec:
|
|||
|
||||
### Effect of Pod priority on scheduling order
|
||||
|
||||
In Kubernetes 1.9 and later, when Pod priority is enabled, scheduler orders
|
||||
pending Pods by their priority and a pending Pod is placed ahead of other
|
||||
pending Pods with lower priority in the scheduling queue. As a result, the
|
||||
higher priority Pod may be scheduled sooner than Pods with lower priority if its
|
||||
scheduling requirements are met. If such Pod cannot be scheduled, scheduler will
|
||||
continue and tries to schedule other lower priority Pods.
|
||||
When Pod priority is enabled, the scheduler orders pending Pods by
|
||||
their priority and a pending Pod is placed ahead of other pending Pods
|
||||
with lower priority in the scheduling queue. As a result, the higher
|
||||
priority Pod may be scheduled sooner than Pods with lower priority if
|
||||
its scheduling requirements are met. If such Pod cannot be scheduled,
|
||||
scheduler will continue and tries to schedule other lower priority Pods.
|
||||
|
||||
## Preemption
|
||||
|
||||
|
@ -281,12 +273,12 @@ point that scheduler preempts victims and the time that Pod P is scheduled. In
|
|||
order to minimize this gap, one can set graceful termination period of lower
|
||||
priority Pods to zero or a small number.
|
||||
|
||||
#### PodDisruptionBudget is supported, but not guaranteed!
|
||||
#### PodDisruptionBudget is supported, but not guaranteed
|
||||
|
||||
A [Pod Disruption Budget (PDB)](/docs/concepts/workloads/pods/disruptions/)
|
||||
allows application owners to limit the number of Pods of a replicated application
|
||||
that are down simultaneously from voluntary disruptions. Kubernetes 1.9 supports
|
||||
PDB when preempting Pods, but respecting PDB is best effort. The Scheduler tries
|
||||
that are down simultaneously from voluntary disruptions. Kubernetes supports
|
||||
PDB when preempting Pods, but respecting PDB is best effort. The scheduler tries
|
||||
to find victims whose PDB are not violated by preemption, but if no such victims
|
||||
are found, preemption will still happen, and lower priority Pods will be removed
|
||||
despite their PDBs being violated.
|
||||
|
@ -337,28 +329,23 @@ gone, and Pod P could possibly be scheduled on Node N.
|
|||
We may consider adding cross Node preemption in future versions if there is
|
||||
enough demand and if we find an algorithm with reasonable performance.
|
||||
|
||||
## Debugging Pod Priority and Preemption
|
||||
## Troubleshooting
|
||||
|
||||
Pod Priority and Preemption is a major feature that could potentially disrupt
|
||||
Pod scheduling if it has bugs.
|
||||
Pod priority and pre-emption can have unwanted side effects. Here are some
|
||||
examples of potential problems and ways to deal with them.
|
||||
|
||||
### Potential problems caused by Priority and Preemption
|
||||
|
||||
The followings are some of the potential problems that could be caused by bugs
|
||||
in the implementation of the feature. This list is not exhaustive.
|
||||
|
||||
#### Pods are preempted unnecessarily
|
||||
### Pods are preempted unnecessarily
|
||||
|
||||
Preemption removes existing Pods from a cluster under resource pressure to make
|
||||
room for higher priority pending Pods. If a user gives high priorities to
|
||||
certain Pods by mistake, these unintentional high priority Pods may cause
|
||||
preemption in the cluster. As mentioned above, Pod priority is specified by
|
||||
setting the `priorityClassName` field of `podSpec`. The integer value of
|
||||
room for higher priority pending Pods. If you give high priorities to
|
||||
certain Pods by mistake, these unintentionally high priority Pods may cause
|
||||
preemption in your cluster. Pod priority is specified by setting the
|
||||
`priorityClassName` field in the Pod's specification. The integer value for
|
||||
priority is then resolved and populated to the `priority` field of `podSpec`.
|
||||
|
||||
To resolve the problem, `priorityClassName` of the Pods must be changed to use
|
||||
lower priority classes or should be left empty. Empty `priorityClassName` is
|
||||
resolved to zero by default.
|
||||
To address the problem, you can change the `priorityClassName` for those Pods
|
||||
to use lower priority classes, or leave that field empty. An empty
|
||||
`priorityClassName` is resolved to zero by default.
|
||||
|
||||
When a Pod is preempted, there will be events recorded for the preempted Pod.
|
||||
Preemption should happen only when a cluster does not have enough resources for
|
||||
|
@ -367,29 +354,31 @@ Pod (preemptor) is higher than the victim Pods. Preemption must not happen when
|
|||
there is no pending Pod, or when the pending Pods have equal or lower priority
|
||||
than the victims. If preemption happens in such scenarios, please file an issue.
|
||||
|
||||
#### Pods are preempted, but the preemptor is not scheduled
|
||||
### Pods are preempted, but the preemptor is not scheduled
|
||||
|
||||
When pods are preempted, they receive their requested graceful termination
|
||||
period, which is by default 30 seconds, but it can be any different value as
|
||||
specified in the PodSpec. If the victim Pods do not terminate within this period,
|
||||
they are force-terminated. Once all the victims go away, the preemptor Pod can
|
||||
be scheduled.
|
||||
period, which is by default 30 seconds. If the victim Pods do not terminate within
|
||||
this period, they are forcibly terminated. Once all the victims go away, the
|
||||
preemptor Pod can be scheduled.
|
||||
|
||||
While the preemptor Pod is waiting for the victims to go away, a higher priority
|
||||
Pod may be created that fits on the same node. In this case, the scheduler will
|
||||
Pod may be created that fits on the same Node. In this case, the scheduler will
|
||||
schedule the higher priority Pod instead of the preemptor.
|
||||
|
||||
In the absence of such a higher priority Pod, we expect the preemptor Pod to be
|
||||
scheduled after the graceful termination period of the victims is over.
|
||||
This is expected behavior: the Pod with the higher priority should take the place
|
||||
of a Pod with a lower priority. Other controller actions, such as
|
||||
[cluster autoscaling](/docs/tasks/administer-cluster/cluster-management/#cluster-autoscaling),
|
||||
may eventually provide capacity to schedule the pending Pods.
|
||||
|
||||
#### Higher priority Pods are preempted before lower priority pods
|
||||
### Higher priority Pods are preempted before lower priority pods
|
||||
|
||||
The scheduler tries to find nodes that can run a pending Pod and if no node is
|
||||
found, it tries to remove Pods with lower priority from one node to make room
|
||||
for the pending pod. If a node with low priority Pods is not feasible to run the
|
||||
pending Pod, the scheduler may choose another node with higher priority Pods
|
||||
(compared to the Pods on the other node) for preemption. The victims must still
|
||||
have lower priority than the preemptor Pod.
|
||||
The scheduler tries to find nodes that can run a pending Pod. If no node is
|
||||
found, the scheduler tries to remove Pods with lower priority from an arbitrary
|
||||
node in order to make room for the pending pod.
|
||||
If a node with low priority Pods is not feasible to run the pending Pod, the scheduler
|
||||
may choose another node with higher priority Pods (compared to the Pods on the
|
||||
other node) for preemption. The victims must still have lower priority than the
|
||||
preemptor Pod.
|
||||
|
||||
When there are multiple nodes available for preemption, the scheduler tries to
|
||||
choose the node with a set of Pods with lowest priority. However, if such Pods
|
||||
|
@ -397,13 +386,11 @@ have PodDisruptionBudget that would be violated if they are preempted then the
|
|||
scheduler may choose another node with higher priority Pods.
|
||||
|
||||
When multiple nodes exist for preemption and none of the above scenarios apply,
|
||||
we expect the scheduler to choose a node with the lowest priority. If that is
|
||||
not the case, it may indicate a bug in the scheduler.
|
||||
the scheduler chooses a node with the lowest priority.
|
||||
|
||||
## Interactions of Pod priority and QoS
|
||||
## Interactions between Pod priority and quality of service {#interactions-of-pod-priority-and-qos}
|
||||
|
||||
Pod priority and
|
||||
[QoS](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/resource-qos.md)
|
||||
Pod priority and {{< glossary_tooltip text="QoS class" term_id="qos-class" >}}
|
||||
are two orthogonal features with few interactions and no default restrictions on
|
||||
setting the priority of a Pod based on its QoS classes. The scheduler's
|
||||
preemption logic does not consider QoS when choosing preemption targets.
|
||||
|
@ -414,15 +401,20 @@ to schedule the preemptor Pod, or if the lowest priority Pods are protected by
|
|||
`PodDisruptionBudget`.
|
||||
|
||||
The only component that considers both QoS and Pod priority is
|
||||
[Kubelet out-of-resource eviction](/docs/tasks/administer-cluster/out-of-resource/).
|
||||
[kubelet out-of-resource eviction](/docs/tasks/administer-cluster/out-of-resource/).
|
||||
The kubelet ranks Pods for eviction first by whether or not their usage of the
|
||||
starved resource exceeds requests, then by Priority, and then by the consumption
|
||||
of the starved compute resource relative to the Pods’ scheduling requests.
|
||||
See
|
||||
[Evicting end-user pods](/docs/tasks/administer-cluster/out-of-resource/#evicting-end-user-pods)
|
||||
for more details. Kubelet out-of-resource eviction does not evict Pods whose
|
||||
[evicting end-user pods](/docs/tasks/administer-cluster/out-of-resource/#evicting-end-user-pods)
|
||||
for more details.
|
||||
|
||||
kubelet out-of-resource eviction does not evict Pods wheir their
|
||||
usage does not exceed their requests. If a Pod with lower priority is not
|
||||
exceeding its requests, it won't be evicted. Another Pod with higher priority
|
||||
that exceeds its requests may be evicted.
|
||||
|
||||
{{% /capture %}}
|
||||
{{% capture whatsnext %}}
|
||||
* Read about using ResourceQuotas in connection with PriorityClasses: [limit Priority Class consumption by default](/docs/concepts/policy/resource-quotas/#limit-priority-class-consumption-by-default)
|
||||
{{% /capture %}}
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -197,11 +197,13 @@ on the special hardware nodes. This will make sure that these special hardware
|
|||
nodes are dedicated for pods requesting such hardware and you don't have to
|
||||
manually add tolerations to your pods.
|
||||
|
||||
* **Taint based Evictions (beta feature)**: A per-pod-configurable eviction behavior
|
||||
* **Taint based Evictions**: A per-pod-configurable eviction behavior
|
||||
when there are node problems, which is described in the next section.
|
||||
|
||||
## Taint based Evictions
|
||||
|
||||
{{< feature-state for_k8s_version="1.18" state="stable" >}}
|
||||
|
||||
Earlier we mentioned the `NoExecute` taint effect, which affects pods that are already
|
||||
running on the node as follows
|
||||
|
||||
|
@ -229,9 +231,9 @@ certain condition is true. The following taints are built in:
|
|||
as unusable. After a controller from the cloud-controller-manager initializes
|
||||
this node, the kubelet removes this taint.
|
||||
|
||||
In version 1.13, the `TaintBasedEvictions` feature is promoted to beta and enabled by default, hence the taints are automatically
|
||||
added by the NodeController (or kubelet) and the normal logic for evicting pods from nodes
|
||||
based on the Ready NodeCondition is disabled.
|
||||
In case a node is to be evicted, the node controller or the kubelet adds relevant taints
|
||||
with `NoExecute` effect. If the fault condition returns to normal the kubelet or node
|
||||
controller can remove the relevant taint(s).
|
||||
|
||||
{{< note >}}
|
||||
To maintain the existing [rate limiting](/docs/concepts/architecture/nodes/)
|
||||
|
@ -240,7 +242,7 @@ in a rate-limited way. This prevents massive pod evictions in scenarios such
|
|||
as the master becoming partitioned from the nodes.
|
||||
{{< /note >}}
|
||||
|
||||
This beta feature, in combination with `tolerationSeconds`, allows a pod
|
||||
The feature, in combination with `tolerationSeconds`, allows a pod
|
||||
to specify how long it should stay bound to a node that has one or both of these problems.
|
||||
|
||||
For example, an application with a lot of local state might want to stay
|
||||
|
@ -277,15 +279,13 @@ admission controller](https://git.k8s.io/kubernetes/plugin/pkg/admission/default
|
|||
* `node.kubernetes.io/unreachable`
|
||||
* `node.kubernetes.io/not-ready`
|
||||
|
||||
This ensures that DaemonSet pods are never evicted due to these problems,
|
||||
which matches the behavior when this feature is disabled.
|
||||
This ensures that DaemonSet pods are never evicted due to these problems.
|
||||
|
||||
## Taint Nodes by Condition
|
||||
|
||||
The node lifecycle controller automatically creates taints corresponding to
|
||||
Node conditions.
|
||||
Node conditions with `NoSchedule` effect.
|
||||
Similarly the scheduler does not check Node conditions; instead the scheduler checks taints. This assures that Node conditions don't affect what's scheduled onto the Node. The user can choose to ignore some of the Node's problems (represented as Node conditions) by adding appropriate Pod tolerations.
|
||||
Note that `TaintNodesByCondition` only taints nodes with `NoSchedule` effect. `NoExecute` effect is controlled by `TaintBasedEviction` which is a beta feature and enabled by default since version 1.13.
|
||||
|
||||
Starting in Kubernetes 1.8, the DaemonSet controller automatically adds the
|
||||
following `NoSchedule` tolerations to all daemons, to prevent DaemonSets from
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
reviewers:
|
||||
- mikedanese
|
||||
- thockin
|
||||
title: Container Environment Variables
|
||||
title: Container Environment
|
||||
content_template: templates/concept
|
||||
weight: 20
|
||||
---
|
|
@ -116,7 +116,7 @@ Events:
|
|||
|
||||
{{% capture whatsnext %}}
|
||||
|
||||
* Learn more about the [Container environment](/docs/concepts/containers/container-environment-variables/).
|
||||
* Learn more about the [Container environment](/docs/concepts/containers/container-environment/).
|
||||
* Get hands-on experience
|
||||
[attaching handlers to Container lifecycle events](/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/).
|
||||
|
||||
|
|
|
@ -67,6 +67,7 @@ Credentials can be provided in several ways:
|
|||
- use IAM roles and policies to control access to OCIR repositories
|
||||
- Using Azure Container Registry (ACR)
|
||||
- Using IBM Cloud Container Registry
|
||||
- use IAM roles and policies to grant access to IBM Cloud Container Registry
|
||||
- Configuring Nodes to Authenticate to a Private Registry
|
||||
- all pods can read any configured private registries
|
||||
- requires node configuration by cluster administrator
|
||||
|
@ -148,11 +149,11 @@ Once you have those variables filled in you can
|
|||
[configure a Kubernetes Secret and use it to deploy a Pod](/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod).
|
||||
|
||||
### Using IBM Cloud Container Registry
|
||||
IBM Cloud Container Registry provides a multi-tenant private image registry that you can use to safely store and share your Docker images. By default, images in your private registry are scanned by the integrated Vulnerability Advisor to detect security issues and potential vulnerabilities. Users in your IBM Cloud account can access your images, or you can create a token to grant access to registry namespaces.
|
||||
IBM Cloud Container Registry provides a multi-tenant private image registry that you can use to safely store and share your images. By default, images in your private registry are scanned by the integrated Vulnerability Advisor to detect security issues and potential vulnerabilities. Users in your IBM Cloud account can access your images, or you can use IAM roles and policies to grant access to IBM Cloud Container Registry namespaces.
|
||||
|
||||
To install the IBM Cloud Container Registry CLI plug-in and create a namespace for your images, see [Getting started with IBM Cloud Container Registry](https://cloud.ibm.com/docs/services/Registry?topic=registry-getting-started).
|
||||
To install the IBM Cloud Container Registry CLI plug-in and create a namespace for your images, see [Getting started with IBM Cloud Container Registry](https://cloud.ibm.com/docs/Registry?topic=registry-getting-started).
|
||||
|
||||
You can use the IBM Cloud Container Registry to deploy containers from [IBM Cloud public images](https://cloud.ibm.com/docs/services/Registry?topic=registry-public_images) and your private images into the `default` namespace of your IBM Cloud Kubernetes Service cluster. To deploy a container into other namespaces, or to use an image from a different IBM Cloud Container Registry region or IBM Cloud account, create a Kubernetes `imagePullSecret`. For more information, see [Building containers from images](https://cloud.ibm.com/docs/containers?topic=containers-images).
|
||||
If you are using the same account and region, you can deploy images that are stored in IBM Cloud Container Registry into the default namespace of your IBM Cloud Kubernetes Service cluster without any additional configuration, see [Building containers from images](https://cloud.ibm.com/docs/containers?topic=containers-images). For other configuration options, see [Understanding how to authorize your cluster to pull images from a registry](https://cloud.ibm.com/docs/containers?topic=containers-registry#cluster_registry_auth).
|
||||
|
||||
### Configuring Nodes to Authenticate to a Private Registry
|
||||
|
||||
|
|
|
@ -0,0 +1,45 @@
|
|||
---
|
||||
reviewers:
|
||||
- erictune
|
||||
- thockin
|
||||
title: Containers overview
|
||||
content_template: templates/concept
|
||||
weight: 1
|
||||
---
|
||||
|
||||
{{% capture overview %}}
|
||||
|
||||
Containers are a technnology for packaging the (compiled) code for an
|
||||
application along with the dependencies it needs at run time. Each
|
||||
container that you run is repeatable; the standardisation from having
|
||||
dependencies included means that you get the same behavior wherever you
|
||||
run it.
|
||||
|
||||
Containers decouple applications from underlying host infrastructure.
|
||||
This makes deployment easier in different cloud or OS environments.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
|
||||
{{% capture body %}}
|
||||
|
||||
## Container images
|
||||
A [container image](/docs/concepts/containers/images/) is a ready-to-run
|
||||
software package, containing everything needed to run an application:
|
||||
the code and any runtime it requires, application and system libraries,
|
||||
and default values for any essential settings.
|
||||
|
||||
By design, a container is immutable: you cannot change the code of a
|
||||
container that is already running. If you have a containerized application
|
||||
and want to make changes, you need to build a new container that includes
|
||||
the change, then recreate the container to start from the updated image.
|
||||
|
||||
## Container runtimes
|
||||
|
||||
{{< glossary_definition term_id="container-runtime" length="all" >}}
|
||||
|
||||
{{% /capture %}}
|
||||
{{% capture whatsnext %}}
|
||||
* Read about [container images](/docs/concepts/containers/images/)
|
||||
* Read about [Pods](/docs/concepts/workloads/pods/)
|
||||
{{% /capture %}}
|
|
@ -13,22 +13,14 @@ weight: 20
|
|||
|
||||
This page describes the RuntimeClass resource and runtime selection mechanism.
|
||||
|
||||
{{< warning >}}
|
||||
RuntimeClass includes *breaking* changes in the beta upgrade in v1.14. If you were using
|
||||
RuntimeClass prior to v1.14, see [Upgrading RuntimeClass from Alpha to
|
||||
Beta](#upgrading-runtimeclass-from-alpha-to-beta).
|
||||
{{< /warning >}}
|
||||
RuntimeClass is a feature for selecting the container runtime configuration. The container runtime
|
||||
configuration is used to run a Pod's containers.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
|
||||
{{% capture body %}}
|
||||
|
||||
## Runtime Class
|
||||
|
||||
RuntimeClass is a feature for selecting the container runtime configuration. The container runtime
|
||||
configuration is used to run a Pod's containers.
|
||||
|
||||
## Motivation
|
||||
|
||||
You can set a different RuntimeClass between different Pods to provide a balance of
|
||||
|
@ -41,7 +33,7 @@ additional overhead.
|
|||
You can also use RuntimeClass to run different Pods with the same container runtime
|
||||
but with different settings.
|
||||
|
||||
### Set Up
|
||||
## Setup
|
||||
|
||||
Ensure the RuntimeClass feature gate is enabled (it is by default). See [Feature
|
||||
Gates](/docs/reference/command-line-tools-reference/feature-gates/) for an explanation of enabling
|
||||
|
@ -50,7 +42,7 @@ feature gates. The `RuntimeClass` feature gate must be enabled on apiservers _an
|
|||
1. Configure the CRI implementation on nodes (runtime dependent)
|
||||
2. Create the corresponding RuntimeClass resources
|
||||
|
||||
#### 1. Configure the CRI implementation on nodes
|
||||
### 1. Configure the CRI implementation on nodes
|
||||
|
||||
The configurations available through RuntimeClass are Container Runtime Interface (CRI)
|
||||
implementation dependent. See the corresponding documentation ([below](#cri-configuration)) for your
|
||||
|
@ -65,7 +57,7 @@ heterogenous node configurations, see [Scheduling](#scheduling) below.
|
|||
The configurations have a corresponding `handler` name, referenced by the RuntimeClass. The
|
||||
handler must be a valid DNS 1123 label (alpha-numeric + `-` characters).
|
||||
|
||||
#### 2. Create the corresponding RuntimeClass resources
|
||||
### 2. Create the corresponding RuntimeClass resources
|
||||
|
||||
The configurations setup in step 1 should each have an associated `handler` name, which identifies
|
||||
the configuration. For each handler, create a corresponding RuntimeClass object.
|
||||
|
@ -82,13 +74,16 @@ metadata:
|
|||
handler: myconfiguration # The name of the corresponding CRI configuration
|
||||
```
|
||||
|
||||
The name of a RuntimeClass object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
{{< note >}}
|
||||
It is recommended that RuntimeClass write operations (create/update/patch/delete) be
|
||||
restricted to the cluster administrator. This is typically the default. See [Authorization
|
||||
Overview](/docs/reference/access-authn-authz/authorization/) for more details.
|
||||
{{< /note >}}
|
||||
|
||||
### Usage
|
||||
## Usage
|
||||
|
||||
Once RuntimeClasses are configured for the cluster, using them is very simple. Specify a
|
||||
`runtimeClassName` in the Pod spec. For example:
|
||||
|
@ -147,14 +142,14 @@ See CRI-O's [config documentation][100] for more details.
|
|||
|
||||
[100]: https://raw.githubusercontent.com/cri-o/cri-o/9f11d1d/docs/crio.conf.5.md
|
||||
|
||||
### Scheduling
|
||||
## Scheduling
|
||||
|
||||
{{< feature-state for_k8s_version="v1.16" state="beta" >}}
|
||||
|
||||
As of Kubernetes v1.16, RuntimeClass includes support for heterogenous clusters through its
|
||||
`scheduling` fields. Through the use of these fields, you can ensure that pods running with this
|
||||
RuntimeClass are scheduled to nodes that support it. To use the scheduling support, you must have
|
||||
the RuntimeClass [admission controller][] enabled (the default, as of 1.16).
|
||||
the [RuntimeClass admission controller][] enabled (the default, as of 1.16).
|
||||
|
||||
To ensure pods land on nodes supporting a specific RuntimeClass, that set of nodes should have a
|
||||
common label which is then selected by the `runtimeclass.scheduling.nodeSelector` field. The
|
||||
|
@ -170,50 +165,23 @@ by each.
|
|||
To learn more about configuring the node selector and tolerations, see [Assigning Pods to
|
||||
Nodes](/docs/concepts/configuration/assign-pod-node/).
|
||||
|
||||
[admission controller]: /docs/reference/access-authn-authz/admission-controllers/
|
||||
[RuntimeClass admission controller]: /docs/reference/access-authn-authz/admission-controllers/#runtimeclass
|
||||
|
||||
### Pod Overhead
|
||||
|
||||
{{< feature-state for_k8s_version="v1.16" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.18" state="beta" >}}
|
||||
|
||||
As of Kubernetes v1.16, RuntimeClass includes support for specifying overhead associated with
|
||||
running a pod, as part of the [`PodOverhead`](/docs/concepts/configuration/pod-overhead/) feature.
|
||||
To use `PodOverhead`, you must have the PodOverhead [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
enabled (it is off by default).
|
||||
You can specify _overhead_ resources that are associated with running a Pod. Declaring overhead allows
|
||||
the cluster (including the scheduler) to account for it when making decisions about Pods and resources.
|
||||
To use Pod overhead, you must have the PodOverhead [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
enabled (it is on by default).
|
||||
|
||||
|
||||
Pod overhead is defined in RuntimeClass through the `Overhead` fields. Through the use of these fields,
|
||||
Pod overhead is defined in RuntimeClass through the `overhead` fields. Through the use of these fields,
|
||||
you can specify the overhead of running pods utilizing this RuntimeClass and ensure these overheads
|
||||
are accounted for in Kubernetes.
|
||||
|
||||
### Upgrading RuntimeClass from Alpha to Beta
|
||||
|
||||
The RuntimeClass Beta feature includes the following changes:
|
||||
|
||||
- The `node.k8s.io` API group and `runtimeclasses.node.k8s.io` resource have been migrated to a
|
||||
built-in API from a CustomResourceDefinition.
|
||||
- The `spec` has been inlined in the RuntimeClass definition (i.e. there is no more
|
||||
RuntimeClassSpec).
|
||||
- The `runtimeHandler` field has been renamed `handler`.
|
||||
- The `handler` field is now required in all API versions. This means the `runtimeHandler` field in
|
||||
the Alpha API is also required.
|
||||
- The `handler` field must be a valid DNS label ([RFC 1123](https://tools.ietf.org/html/rfc1123)),
|
||||
meaning it can no longer contain `.` characters (in all versions). Valid handlers match the
|
||||
following regular expression: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`.
|
||||
|
||||
**Action Required:** The following actions are required to upgrade from the alpha version of the
|
||||
RuntimeClass feature to the beta version:
|
||||
|
||||
- RuntimeClass resources must be recreated *after* upgrading to v1.14, and the
|
||||
`runtimeclasses.node.k8s.io` CRD should be manually deleted:
|
||||
```
|
||||
kubectl delete customresourcedefinitions.apiextensions.k8s.io runtimeclasses.node.k8s.io
|
||||
```
|
||||
- Alpha RuntimeClasses with an unspecified or empty `runtimeHandler` or those using a `.` character
|
||||
in the handler are no longer valid, and must be migrated to a valid handler configuration (see
|
||||
above).
|
||||
|
||||
### Further Reading
|
||||
{{% /capture %}}
|
||||
{{% capture whatsnext %}}
|
||||
|
||||
- [RuntimeClass Design](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/runtime-class.md)
|
||||
- [RuntimeClass Scheduling Design](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/runtime-class-scheduling.md)
|
||||
|
|
|
@ -5,30 +5,34 @@ reviewers:
|
|||
- cheftako
|
||||
- chenopis
|
||||
content_template: templates/concept
|
||||
weight: 10
|
||||
weight: 20
|
||||
---
|
||||
|
||||
{{% capture overview %}}
|
||||
|
||||
The aggregation layer allows Kubernetes to be extended with additional APIs, beyond what is offered by the core Kubernetes APIs.
|
||||
The aggregation layer allows Kubernetes to be extended with additional APIs, beyond what is offered by the core Kubernetes APIs.
|
||||
The additional APIs can either be ready-made solutions such as [service-catalog](/docs/concepts/extend-kubernetes/service-catalog/), or APIs that you develop yourself.
|
||||
|
||||
The aggregation layer is different from [Custom Resources](/docs/concepts/extend-kubernetes/api-extension/custom-resources/), which are a way to make the {{< glossary_tooltip term_id="kube-apiserver" text="kube-apiserver" >}} recognise new kinds of object.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture body %}}
|
||||
|
||||
## Overview
|
||||
## Aggregation layer
|
||||
|
||||
The aggregation layer enables installing additional Kubernetes-style APIs in your cluster. These can either be pre-built, existing 3rd party solutions, such as [service-catalog](https://github.com/kubernetes-incubator/service-catalog/blob/master/README.md), or user-created APIs like [apiserver-builder](https://github.com/kubernetes-incubator/apiserver-builder/blob/master/README.md), which can get you started.
|
||||
The aggregation layer runs in-process with the kube-apiserver. Until an extension resource is registered, the aggregation layer will do nothing. To register an API, you add an _APIService_ object, which "claims" the URL path in the Kubernetes API. At that point, the aggregation layer will proxy anything sent to that API path (e.g. `/apis/myextension.mycompany.io/v1/…`) to the registered APIService.
|
||||
|
||||
The aggregation layer runs in-process with the kube-apiserver. Until an extension resource is registered, the aggregation layer will do nothing. To register an API, users must add an APIService object, which "claims" the URL path in the Kubernetes API. At that point, the aggregation layer will proxy anything sent to that API path (e.g. /apis/myextension.mycompany.io/v1/…) to the registered APIService.
|
||||
The most common way to implement the APIService is to run an *extension API server* in Pod(s) that run in your cluster. If you're using the extension API server to manage resources in your cluster, the extension API server (also written as "extension-apiserver") is typically paired with one or more {{< glossary_tooltip text="controllers" term_id="controller" >}}. The apiserver-builder library provides a skeleton for both extension API servers and the associated controller(s).
|
||||
|
||||
Ordinarily, the APIService will be implemented by an *extension-apiserver* in a pod running in the cluster. This extension-apiserver will normally need to be paired with one or more controllers if active management of the added resources is needed. As a result, the apiserver-builder will actually provide a skeleton for both. As another example, when the service-catalog is installed, it provides both the extension-apiserver and controller for the services it provides.
|
||||
### Response latency
|
||||
|
||||
Extension-apiservers should have low latency connections to and from the kube-apiserver.
|
||||
In particular, discovery requests are required to round-trip from the kube-apiserver in five seconds or less.
|
||||
If your deployment cannot achieve this, you should consider how to change it. For now, setting the
|
||||
`EnableAggregatedDiscoveryTimeout=false` feature gate on the kube-apiserver
|
||||
will disable the timeout restriction. It will be removed in a future release.
|
||||
Extension API servers should have low latency networking to and from the kube-apiserver.
|
||||
Discovery requests are required to round-trip from the kube-apiserver in five seconds or less.
|
||||
|
||||
If your extension API server cannot achieve that latency requirement, consider making changes that let you meet it. You can also set the
|
||||
`EnableAggregatedDiscoveryTimeout=false` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) on the kube-apiserver
|
||||
to disable the timeout restriction. This deprecated feature gate will be removed in a future release.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
|
@ -37,7 +41,6 @@ will disable the timeout restriction. It will be removed in a future release.
|
|||
* To get the aggregator working in your environment, [configure the aggregation layer](/docs/tasks/access-kubernetes-api/configure-aggregation-layer/).
|
||||
* Then, [setup an extension api-server](/docs/tasks/access-kubernetes-api/setup-extension-api-server/) to work with the aggregation layer.
|
||||
* Also, learn how to [extend the Kubernetes API using Custom Resource Definitions](/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/).
|
||||
* Read the specification for [APIService](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#apiservice-v1-apiregistration-k8s-io)
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
|
||||
|
|
|
@ -4,7 +4,7 @@ reviewers:
|
|||
- enisoc
|
||||
- deads2k
|
||||
content_template: templates/concept
|
||||
weight: 20
|
||||
weight: 10
|
||||
---
|
||||
|
||||
{{% capture overview %}}
|
||||
|
@ -37,7 +37,7 @@ On their own, custom resources simply let you store and retrieve structured data
|
|||
When you combine a custom resource with a *custom controller*, custom resources
|
||||
provide a true _declarative API_.
|
||||
|
||||
A [declarative API](/docs/concepts/overview/working-with-objects/kubernetes-objects/#understanding-kubernetes-objects)
|
||||
A [declarative API](/docs/concepts/overview/kubernetes-api/)
|
||||
allows you to _declare_ or specify the desired state of your resource and tries to
|
||||
keep the current state of Kubernetes objects in sync with the desired state.
|
||||
The controller interprets the structured data as a record of the user's
|
||||
|
@ -128,7 +128,12 @@ Regardless of how they are installed, the new resources are referred to as Custo
|
|||
|
||||
## CustomResourceDefinitions
|
||||
|
||||
The [CustomResourceDefinition](/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/) API resource allows you to define custom resources. Defining a CRD object creates a new custom resource with a name and schema that you specify. The Kubernetes API serves and handles the storage of your custom resource.
|
||||
The [CustomResourceDefinition](/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/)
|
||||
API resource allows you to define custom resources.
|
||||
Defining a CRD object creates a new custom resource with a name and schema that you specify.
|
||||
The Kubernetes API serves and handles the storage of your custom resource.
|
||||
The name of a CRD object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
This frees you from writing your own API server to handle the custom resource,
|
||||
but the generic nature of the implementation means you have less flexibility than with
|
||||
|
@ -162,7 +167,7 @@ CRDs are easier to create than Aggregated APIs.
|
|||
|
||||
| CRDs | Aggregated API |
|
||||
| --------------------------- | -------------- |
|
||||
| Do not require programming. Users can choose any language for a CRD controller. | Requires programming in Go and building binary and image. Users can choose any language for a CRD controller. |
|
||||
| Do not require programming. Users can choose any language for a CRD controller. | Requires programming in Go and building binary and image. |
|
||||
| No additional service to run; CRs are handled by API Server. | An additional service to create and that could fail. |
|
||||
| No ongoing support once the CRD is created. Any bug fixes are picked up as part of normal Kubernetes Master upgrades. | May need to periodically pickup bug fixes from upstream and rebuild and update the Aggregated APIserver. |
|
||||
| No need to handle multiple versions of your API. For example: when you control the client for this resource, you can upgrade it in sync with the API. | You need to handle multiple versions of your API, for example: when developing an extension to share with the world. |
|
||||
|
@ -179,7 +184,7 @@ Aggregated APIs offer more advanced API features and customization of other feat
|
|||
| Custom Storage | If you need storage with a different performance mode (for example, time-series database instead of key-value store) or isolation for security (for example, encryption secrets or different | No | Yes |
|
||||
| Custom Business Logic | Perform arbitrary checks or actions when creating, reading, updating or deleting an object | Yes, using [Webhooks](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks). | Yes |
|
||||
| Scale Subresource | Allows systems like HorizontalPodAutoscaler and PodDisruptionBudget interact with your new resource | [Yes](/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/#scale-subresource) | Yes |
|
||||
| Status Subresource | <ul><li>Finer-grained access control: user writes spec section, controller writes status section.</li><li>Allows incrementing object Generation on custom resource data mutation (requires separate spec and status sections in the resource)</li></ul> | [Yes](/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/#status-subresource) | Yes |
|
||||
| Status Subresource | Allows fine-grained access control where user writes the spec section and the controller writes the status section. Allows incrementing object Generation on custom resource data mutation (requires separate spec and status sections in the resource) | [Yes](/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/#status-subresource) | Yes |
|
||||
| Other Subresources | Add operations other than CRUD, such as "logs" or "exec". | No | Yes |
|
||||
| strategic-merge-patch | The new endpoints support PATCH with `Content-Type: application/strategic-merge-patch+json`. Useful for updating objects that may be modified both locally, and by the server. For more information, see ["Update API Objects in Place Using kubectl patch"](/docs/tasks/run-application/update-api-object-kubectl-patch/) | No | Yes |
|
||||
| Protocol Buffers | The new resource supports clients that want to use Protocol Buffers | No | Yes |
|
||||
|
@ -202,7 +207,7 @@ When you create a custom resource, either via a CRDs or an AA, you get many feat
|
|||
| Finalizers | Block deletion of extension resources until external cleanup happens. |
|
||||
| Admission Webhooks | Set default values and validate extension resources during any create/update/delete operation. |
|
||||
| UI/CLI Display | Kubectl, dashboard can display extension resources. |
|
||||
| Unset vs Empty | Clients can distinguish unset fields from zero-valued fields. |
|
||||
| Unset versus Empty | Clients can distinguish unset fields from zero-valued fields. |
|
||||
| Client Libraries Generation | Kubernetes provides generic client libraries, as well as tools to generate type-specific client libraries. |
|
||||
| Labels and annotations | Common metadata across objects that tools know how to edit for core and custom resources. |
|
||||
|
||||
|
|
|
@ -12,7 +12,7 @@ weight: 10
|
|||
{{% capture overview %}}
|
||||
|
||||
{{< feature-state state="alpha" >}}
|
||||
{{< warning >}}Alpha features change rapidly. {{< /warning >}}
|
||||
{{< caution >}}Alpha features can change rapidly. {{< /caution >}}
|
||||
|
||||
Network plugins in Kubernetes come in a few flavors:
|
||||
|
||||
|
@ -154,7 +154,7 @@ most network plugins.
|
|||
|
||||
Where needed, you can specify the MTU explicitly with the `network-plugin-mtu` kubelet option. For example,
|
||||
on AWS the `eth0` MTU is typically 9001, so you might specify `--network-plugin-mtu=9001`. If you're using IPSEC you
|
||||
might reduce it to allow for encapsulation overhead e.g. `--network-plugin-mtu=8873`.
|
||||
might reduce it to allow for encapsulation overhead; for example: `--network-plugin-mtu=8873`.
|
||||
|
||||
This option is provided to the network-plugin; currently **only kubenet supports `network-plugin-mtu`**.
|
||||
|
||||
|
|
|
@ -1,117 +1,111 @@
|
|||
---
|
||||
title: Poseidon-Firmament - An alternate scheduler
|
||||
title: Poseidon-Firmament Scheduler
|
||||
content_template: templates/concept
|
||||
weight: 80
|
||||
---
|
||||
|
||||
{{% capture overview %}}
|
||||
|
||||
**Current release of Poseidon-Firmament scheduler is an <code> alpha </code> release.**
|
||||
{{< feature-state for_k8s_version="v1.6" state="alpha" >}}
|
||||
|
||||
Poseidon-Firmament scheduler is an alternate scheduler that can be deployed alongside the default Kubernetes scheduler.
|
||||
The Poseidon-Firmament scheduler is an alternate scheduler that can be deployed alongside the default Kubernetes scheduler.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture body %}}
|
||||
|
||||
|
||||
## Introduction
|
||||
## Introduction
|
||||
|
||||
Poseidon is a service that acts as the integration glue for the [Firmament scheduler](https://github.com/Huawei-PaaS/firmament) with Kubernetes. Poseidon-Firmament scheduler augments the current Kubernetes scheduling capabilities. It incorporates novel flow network graph based scheduling capabilities alongside the default Kubernetes Scheduler. Firmament scheduler models workloads and clusters as flow networks and runs min-cost flow optimizations over these networks to make scheduling decisions.
|
||||
Poseidon is a service that acts as the integration glue between the [Firmament scheduler](https://github.com/Huawei-PaaS/firmament) and Kubernetes. Poseidon-Firmament augments the current Kubernetes scheduling capabilities. It incorporates novel flow network graph based scheduling capabilities alongside the default Kubernetes scheduler. The Firmament scheduler models workloads and clusters as flow networks and runs min-cost flow optimizations over these networks to make scheduling decisions.
|
||||
|
||||
It models the scheduling problem as a constraint-based optimization over a flow network graph. This is achieved by reducing scheduling to a min-cost max-flow optimization problem. The Poseidon-Firmament scheduler dynamically refines the workload placements.
|
||||
Firmament models the scheduling problem as a constraint-based optimization over a flow network graph. This is achieved by reducing scheduling to a min-cost max-flow optimization problem. The Poseidon-Firmament scheduler dynamically refines the workload placements.
|
||||
|
||||
Poseidon-Firmament scheduler runs alongside the default Kubernetes Scheduler as an alternate scheduler, so multiple schedulers run simultaneously.
|
||||
Poseidon-Firmament scheduler runs alongside the default Kubernetes scheduler as an alternate scheduler. You can simultaneously run multiple, different schedulers.
|
||||
|
||||
## Key Advantages
|
||||
Flow graph scheduling with the Poseidon-Firmament scheduler provides the following advantages:
|
||||
|
||||
### Flow graph scheduling based Poseidon-Firmament scheduler provides the following key advantages:
|
||||
- Workloads (pods) are bulk scheduled to enable scheduling at massive scale..
|
||||
- Based on the extensive performance test results, Poseidon-Firmament scales much better than the Kubernetes default scheduler as the number of nodes increase in a cluster. This is due to the fact that Poseidon-Firmament is able to amortize more and more work across workloads.
|
||||
- Poseidon-Firmament Scheduler outperforms the Kubernetes default scheduler by a wide margin when it comes to throughput performance numbers for scenarios where compute resource requirements are somewhat uniform across jobs (Replicasets/Deployments/Jobs). Poseidon-Firmament scheduler end-to-end throughput performance numbers, including bind time, consistently get better as the number of nodes in a cluster increase. For example, for a 2,700 node cluster (shown in the graphs [here](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/benchmark/README.md)), Poseidon-Firmament scheduler achieves a 7X or greater end-to-end throughput than the Kubernetes default scheduler, which includes bind time.
|
||||
- Workloads (Pods) are bulk scheduled to enable scheduling at massive scale.
|
||||
The Poseidon-Firmament scheduler outperforms the Kubernetes default scheduler by a wide margin when it comes to throughput performance for scenarios where compute resource requirements are somewhat uniform across your workload (Deployments, ReplicaSets, Jobs).
|
||||
- The Poseidon-Firmament's scheduler's end-to-end throughput performance and bind time improves as the number of nodes in a cluster increases. As you scale out, Poseidon-Firmament scheduler is able to amortize more and more work across workloads.
|
||||
- Scheduling in Poseidon-Firmament is dynamic; it keeps cluster resources in a global optimal state during every scheduling run.
|
||||
- The Poseidon-Firmament scheduler supports scheduling complex rule constraints.
|
||||
|
||||
- Availability of complex rule constraints.
|
||||
- Scheduling in Poseidon-Firmament is dynamic; it keeps cluster resources in a global optimal state during every scheduling run.
|
||||
- Highly efficient resource utilizations.
|
||||
## How the Poseidon-Firmament scheduler works
|
||||
|
||||
## Poseidon-Firmament Scheduler - How it works
|
||||
Kubernetes supports [using multiple schedulers](/docs/tasks/administer-cluster/configure-multiple-schedulers/). You can specify, for a particular Pod, that it is scheduled by a custom scheduler (“poseidon” for this case), by setting the `schedulerName` field in the PodSpec at the time of pod creation. The default scheduler will ignore that Pod and allow Poseidon-Firmament scheduler to schedule the Pod on a relevant node.
|
||||
|
||||
As part of the Kubernetes multiple schedulers support, each new pod is typically scheduled by the default scheduler. Kubernetes can be instructed to use another scheduler by specifying the name of another custom scheduler (“poseidon” in our case) in the **schedulerName** field of the PodSpec at the time of pod creation. In this case, the default scheduler will ignore that Pod and allow Poseidon scheduler to schedule the Pod on a relevant node.
|
||||
For example:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
|
||||
...
|
||||
spec:
|
||||
schedulerName: poseidon
|
||||
```
|
||||
schedulerName: poseidon
|
||||
...
|
||||
```
|
||||
|
||||
|
||||
{{< note >}}
|
||||
For details about the design of this project see the [design document](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/design/README.md).
|
||||
{{< /note >}}
|
||||
|
||||
## Possible Use Case Scenarios - When to use it
|
||||
## Batch scheduling
|
||||
|
||||
As mentioned earlier, Poseidon-Firmament scheduler enables an extremely high throughput scheduling environment at scale due to its bulk scheduling approach versus Kubernetes pod-at-a-time approach. In our extensive tests, we have observed substantial throughput benefits as long as resource requirements (CPU/Memory) for incoming Pods are uniform across jobs (Replicasets/Deployments/Jobs), mainly due to efficient amortization of work across jobs.
|
||||
|
||||
Although, Poseidon-Firmament scheduler is capable of scheduling various types of workloads, such as service, batch, etc., the following are a few use cases where it excels the most:
|
||||
|
||||
1. For “Big Data/AI” jobs consisting of large number of tasks, throughput benefits are tremendous.
|
||||
2. Service or batch jobs where workload resource requirements are uniform across jobs (Replicasets/Deployments/Jobs).
|
||||
1. For “Big Data/AI” jobs consisting of large number of tasks, throughput benefits are tremendous.
|
||||
2. Service or batch jobs where workload resource requirements are uniform across jobs (Replicasets/Deployments/Jobs).
|
||||
|
||||
## Current Project Stage
|
||||
## Feature state
|
||||
|
||||
- **Alpha Release - Incubation repo.** at https://github.com/kubernetes-sigs/poseidon.
|
||||
- Currently, Poseidon-Firmament scheduler **does not provide support for high availability**, our implementation assumes that the scheduler cannot fail. The [design document](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/design/README.md) describes possible ways to enable high availability, but we leave this to future work.
|
||||
- We are **not aware of any production deployment** of Poseidon-Firmament scheduler at this time.
|
||||
- Poseidon-Firmament is supported from Kubernetes release 1.6 and works with all subsequent releases.
|
||||
- Release process for Poseidon and Firmament repos are in lock step. The current Poseidon release can be found [here](https://github.com/kubernetes-sigs/poseidon/releases) and the corresponding Firmament release can be found [here](https://github.com/Huawei-PaaS/firmament/releases).
|
||||
Poseidon-Firmament is designed to work with Kubernetes release 1.6 and all subsequent releases.
|
||||
|
||||
## Features Comparison Matrix
|
||||
{{< caution >}}
|
||||
Poseidon-Firmament scheduler does not provide support for high availability; its implementation assumes that the scheduler cannot fail.
|
||||
{{< /caution >}}
|
||||
|
||||
## Feature comparison {#feature-comparison-matrix}
|
||||
|
||||
{{< table caption="Feature comparison of Kubernetes and Poseidon-Firmament schedulers." >}}
|
||||
|Feature|Kubernetes Default Scheduler|Poseidon-Firmament Scheduler|Notes|
|
||||
|--- |--- |--- |--- |
|
||||
|Node Affinity/Anti-Affinity|Y|Y||
|
||||
|Pod Affinity/Anti-Affinity - including support for pod anti-affinity symmetry|Y|Y|Currently, the default scheduler outperforms the Poseidon-Firmament scheduler pod affinity/anti-affinity functionality. We are working towards resolving this.|
|
||||
|Pod Affinity/Anti-Affinity - including support for pod anti-affinity symmetry|Y|Y|The default scheduler outperforms the Poseidon-Firmament scheduler pod affinity/anti-affinity functionality.|
|
||||
|Taints & Tolerations|Y|Y||
|
||||
|Baseline Scheduling capability in accordance to available compute resources (CPU & Memory) on a node|Y|Y**|Not all Predicates & Priorities are supported at this time.|
|
||||
|Extreme Throughput at scale|Y**|Y|Bulk scheduling approach scales or increases workload placement. Substantial throughput benefits using Firmament scheduler as long as resource requirements (CPU/Memory) for incoming Pods is uniform across Replicasets/Deployments/Jobs. This is mainly due to efficient amortization of work across Replicasets/Deployments/Jobs . 1) For “Big Data/AI” jobs consisting of large no. of tasks, throughput benefits are tremendous. 2) Substantial throughput benefits also for service or batch job scenarios where workload resource requirements are uniform across Replicasets/Deployments/Jobs.|
|
||||
|Optimal Scheduling|Pod-by-Pod scheduler, processes one pod at a time (may result into sub-optimal scheduling)|Bulk Scheduling (Optimal scheduling)|Pod-by-Pod Kubernetes default scheduler may assign tasks to a sub-optimal machine. By contrast, Firmament considers all unscheduled tasks at the same time together with their soft and hard constraints.|
|
||||
|Colocation Interference Avoidance|N|N**|Planned in Poseidon-Firmament.|
|
||||
|Priority Pre-emption|Y|N**|Partially exists in Poseidon-Firmament versus extensive support in Kubernetes default scheduler.|
|
||||
|Inherent Re-Scheduling|N|Y**|Poseidon-Firmament scheduler supports workload re-scheduling. In each scheduling run it considers all the pods, including running pods, and as a result can migrate or evict pods – a globally optimal scheduling environment.|
|
||||
|Baseline Scheduling capability in accordance to available compute resources (CPU & Memory) on a node|Y|Y†|**†** Not all Predicates & Priorities are supported with Poseidon-Firmament.|
|
||||
|Extreme Throughput at scale|Y†|Y|**†** Bulk scheduling approach scales or increases workload placement. Firmament scheduler offers high throughput when resource requirements (CPU/Memory) for incoming Pods are uniform across ReplicaSets/Deployments/Jobs.|
|
||||
|Colocation Interference Avoidance|N|N||
|
||||
|Priority Preemption|Y|N†|**†** Partially exists in Poseidon-Firmament versus extensive support in Kubernetes default scheduler.|
|
||||
|Inherent Rescheduling|N|Y†|**†** Poseidon-Firmament scheduler supports workload re-scheduling. In each scheduling run, Poseidon-Firmament considers all Pods, including running Pods, and as a result can migrate or evict Pods – a globally optimal scheduling environment.|
|
||||
|Gang Scheduling|N|Y||
|
||||
|Support for Pre-bound Persistence Volume Scheduling|Y|Y||
|
||||
|Support for Local Volume & Dynamic Persistence Volume Binding Scheduling|Y|N**|Planned.|
|
||||
|High Availability|Y|N**|Planned.|
|
||||
|Real-time metrics based scheduling|N|Y**|Initially supported using Heapster (now deprecated) for placing pods using actual cluster utilization statistics rather than reservations. Plans to switch over to "metric server".|
|
||||
|Support for Local Volume & Dynamic Persistence Volume Binding Scheduling|Y|N||
|
||||
|High Availability|Y|N||
|
||||
|Real-time metrics based scheduling|N|Y†|**†** Partially supported in Poseidon-Firmament using Heapster (now deprecated) for placing Pods using actual cluster utilization statistics rather than reservations.|
|
||||
|Support for Max-Pod per node|Y|Y|Poseidon-Firmament scheduler seamlessly co-exists with Kubernetes default scheduler.|
|
||||
|Support for Ephemeral Storage, in addition to CPU/Memory|Y|Y||
|
||||
{{< /table >}}
|
||||
|
||||
## Installation
|
||||
|
||||
## Installation
|
||||
The [Poseidon-Firmament installation guide](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/install/README.md#Installation) explains how to deploy Poseidon-Firmament to your cluster.
|
||||
|
||||
For in-cluster installation of Poseidon, please start at the [Installation instructions](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/install/README.md).
|
||||
|
||||
|
||||
## Development
|
||||
|
||||
For developers, please refer to the [Developer Setup instructions](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/devel/README.md).
|
||||
|
||||
## Latest Throughput Performance Testing Results
|
||||
|
||||
Pod-by-pod schedulers, such as the Kubernetes default scheduler, typically process one pod at a time. These schedulers have the following crucial drawbacks:
|
||||
|
||||
1. The scheduler commits to a pod placement early and restricts the choices for other pods that wait to be placed.
|
||||
2. There is limited opportunities for amortizing work across pods because they are considered for placement individually.
|
||||
|
||||
These downsides of pod-by-pod schedulers are addressed by batching or bulk scheduling in Poseidon-Firmament scheduler. Processing several pods in a batch allows the scheduler to jointly consider their placement, and thus to find the best trade-off for the whole batch instead of one pod. At the same time it amortizes work across pods resulting in much higher throughput.
|
||||
## Performance comparison
|
||||
|
||||
{{< note >}}
|
||||
Please refer to the [latest benchmark results](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/benchmark/README.md) for detailed throughput performance comparison test results between Poseidon-Firmament scheduler and the Kubernetes default scheduler.
|
||||
{{< /note >}}
|
||||
|
||||
Pod-by-pod schedulers, such as the Kubernetes default scheduler, process Pods in small batches (typically one at a time). These schedulers have the following crucial drawbacks:
|
||||
|
||||
1. The scheduler commits to a pod placement early and restricts the choices for other pods that wait to be placed.
|
||||
2. There is limited opportunities for amortizing work across pods because they are considered for placement individually.
|
||||
|
||||
These downsides of pod-by-pod schedulers are addressed by batching or bulk scheduling in Poseidon-Firmament scheduler. Processing several pods in a batch allows the scheduler to jointly consider their placement, and thus to find the best trade-off for the whole batch instead of one pod. At the same time it amortizes work across pods resulting in much higher throughput.
|
||||
|
||||
{{% /capture %}}
|
||||
{{% capture whatsnext %}}
|
||||
* See [Poseidon-Firmament](https://github.com/kubernetes-sigs/poseidon#readme) on GitHub for more information.
|
||||
* See the [design document](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/design/README.md) for Poseidon.
|
||||
* Read [Firmament: Fast, Centralized Cluster Scheduling at Scale](https://www.usenix.org/system/files/conference/osdi16/osdi16-gog.pdf), the academic paper on the Firmament scheduling design.
|
||||
* If you'd like to contribute to Poseidon-Firmament, refer to the [developer setup instructions](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/devel/README.md).
|
||||
{{% /capture %}}
|
||||
|
|
|
@ -121,21 +121,22 @@ There are two supported paths to extending the API with [custom resources](/docs
|
|||
to make it seamless for clients.
|
||||
|
||||
|
||||
## Enabling API groups
|
||||
## Enabling or disabling API groups
|
||||
|
||||
Certain resources and API groups are enabled by default. They can be enabled or disabled by setting `--runtime-config`
|
||||
on apiserver. `--runtime-config` accepts comma separated values. For ex: to disable batch/v1, set
|
||||
on apiserver. `--runtime-config` accepts comma separated values. For example: to disable batch/v1, set
|
||||
`--runtime-config=batch/v1=false`, to enable batch/v2alpha1, set `--runtime-config=batch/v2alpha1`.
|
||||
The flag accepts comma separated set of key=value pairs describing runtime configuration of the apiserver.
|
||||
|
||||
IMPORTANT: Enabling or disabling groups or resources requires restarting apiserver and controller-manager
|
||||
to pick up the `--runtime-config` changes.
|
||||
{{< note >}}Enabling or disabling groups or resources requires restarting apiserver and controller-manager
|
||||
to pick up the `--runtime-config` changes.{{< /note >}}
|
||||
|
||||
## Enabling resources in the groups
|
||||
## Enabling specific resources in the extensions/v1beta1 group
|
||||
|
||||
DaemonSets, Deployments, HorizontalPodAutoscalers, Ingresses, Jobs and ReplicaSets are enabled by default.
|
||||
Other extensions resources can be enabled by setting `--runtime-config` on
|
||||
apiserver. `--runtime-config` accepts comma separated values. For example: to disable deployments and ingress, set
|
||||
`--runtime-config=extensions/v1beta1/deployments=false,extensions/v1beta1/ingresses=false`
|
||||
DaemonSets, Deployments, StatefulSet, NetworkPolicies, PodSecurityPolicies and ReplicaSets in the `extensions/v1beta1` API group are disabled by default.
|
||||
For example: to enable deployments and daemonsets, set
|
||||
`--runtime-config=extensions/v1beta1/deployments=true,extensions/v1beta1/daemonsets=true`.
|
||||
|
||||
{{< note >}}Individual resource enablement/disablement is only supported in the `extensions/v1beta1` API group for legacy reasons.{{< /note >}}
|
||||
|
||||
{{% /capture %}}
|
||||
|
|
|
@ -2,7 +2,9 @@
|
|||
reviewers:
|
||||
- bgrant0607
|
||||
- mikedanese
|
||||
title: What is Kubernetes
|
||||
title: What is Kubernetes?
|
||||
description: >
|
||||
Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available.
|
||||
content_template: templates/concept
|
||||
weight: 10
|
||||
card:
|
||||
|
@ -17,9 +19,10 @@ This page is an overview of Kubernetes.
|
|||
{{% capture body %}}
|
||||
Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available.
|
||||
|
||||
The name Kubernetes originates from Greek, meaning helmsman or pilot. Google open-sourced the Kubernetes project in 2014. Kubernetes builds upon a [decade and a half of experience that Google has with running production workloads at scale](https://ai.google/research/pubs/pub43438), combined with best-of-breed ideas and practices from the community.
|
||||
The name Kubernetes originates from Greek, meaning helmsman or pilot. Google open-sourced the Kubernetes project in 2014. Kubernetes combines [over 15 years of Google's experience](/blog/2015/04/borg-predecessor-to-kubernetes/) running production workloads at scale with best-of-breed ideas and practices from the community.
|
||||
|
||||
## Going back in time
|
||||
|
||||
Let's take a look at why Kubernetes is so useful by going back in time.
|
||||
|
||||

|
||||
|
@ -42,13 +45,13 @@ Containers have become popular because they provide extra benefits, such as:
|
|||
* Dev and Ops separation of concerns: create application container images at build/release time rather than deployment time, thereby decoupling applications from infrastructure.
|
||||
* Observability not only surfaces OS-level information and metrics, but also application health and other signals.
|
||||
* Environmental consistency across development, testing, and production: Runs the same on a laptop as it does in the cloud.
|
||||
* Cloud and OS distribution portability: Runs on Ubuntu, RHEL, CoreOS, on-prem, Google Kubernetes Engine, and anywhere else.
|
||||
* Cloud and OS distribution portability: Runs on Ubuntu, RHEL, CoreOS, on-premises, on major public clouds, and anywhere else.
|
||||
* Application-centric management: Raises the level of abstraction from running an OS on virtual hardware to running an application on an OS using logical resources.
|
||||
* Loosely coupled, distributed, elastic, liberated micro-services: applications are broken into smaller, independent pieces and can be deployed and managed dynamically – not a monolithic stack running on one big single-purpose machine.
|
||||
* Resource isolation: predictable application performance.
|
||||
* Resource utilization: high efficiency and density.
|
||||
|
||||
## Why you need Kubernetes and what can it do
|
||||
## Why you need Kubernetes and what it can do {#why-you-need-kubernetes-and-what-can-it-do}
|
||||
|
||||
Containers are a good way to bundle and run your applications. In a production environment, you need to manage the containers that run the applications and ensure that there is no downtime. For example, if a container goes down, another container needs to start. Wouldn't it be easier if this behavior was handled by a system?
|
||||
|
||||
|
|
|
@ -82,7 +82,7 @@ metadata:
|
|||
spec:
|
||||
containers:
|
||||
- name: nginx
|
||||
image: nginx:1.7.9
|
||||
image: nginx:1.14.2
|
||||
ports:
|
||||
- containerPort: 80
|
||||
|
||||
|
|
|
@ -12,9 +12,9 @@ This page explains how Kubernetes objects are represented in the Kubernetes API,
|
|||
{{% /capture %}}
|
||||
|
||||
{{% capture body %}}
|
||||
## Understanding Kubernetes Objects
|
||||
## Understanding Kubernetes objects {#kubernetes-objects}
|
||||
|
||||
*Kubernetes Objects* are persistent entities in the Kubernetes system. Kubernetes uses these entities to represent the state of your cluster. Specifically, they can describe:
|
||||
*Kubernetes objects* are persistent entities in the Kubernetes system. Kubernetes uses these entities to represent the state of your cluster. Specifically, they can describe:
|
||||
|
||||
* What containerized applications are running (and on which nodes)
|
||||
* The resources available to those applications
|
||||
|
@ -26,14 +26,31 @@ To work with Kubernetes objects--whether to create, modify, or delete them--you'
|
|||
|
||||
### Object Spec and Status
|
||||
|
||||
Every Kubernetes object includes two nested object fields that govern the object's configuration: the object *spec* and the object *status*. The *spec*, which you must provide, describes your desired state for the object--the characteristics that you want the object to have. The *status* describes the *actual state* of the object, and is supplied and updated by the Kubernetes system. At any given time, the Kubernetes Control Plane actively manages an object's actual state to match the desired state you supplied.
|
||||
Almost every Kubernetes object includes two nested object fields that govern
|
||||
the object's configuration: the object *`spec`* and the object *`status`*.
|
||||
For objects that have a `spec`, you have to set this when you create the object,
|
||||
providing a description of the characteristics you want the resource to have:
|
||||
its _desired state_.
|
||||
|
||||
The `status` describes the _current state_ of the object, supplied and updated
|
||||
by the Kubernetes and its components. The Kubernetes
|
||||
{{< glossary_tooltip text="control plane" term_id="control-plane" >}} continually
|
||||
and actively manages every object's actual state to match the desired state you
|
||||
supplied.
|
||||
|
||||
For example, a Kubernetes Deployment is an object that can represent an application running on your cluster. When you create the Deployment, you might set the Deployment spec to specify that you want three replicas of the application to be running. The Kubernetes system reads the Deployment spec and starts three instances of your desired application--updating the status to match your spec. If any of those instances should fail (a status change), the Kubernetes system responds to the difference between spec and status by making a correction--in this case, starting a replacement instance.
|
||||
For example: in Kubernetes, a Deployment is an object that can represent an
|
||||
application running on your cluster. When you create the Deployment, you
|
||||
might set the Deployment `spec` to specify that you want three replicas of
|
||||
the application to be running. The Kubernetes system reads the Deployment
|
||||
spec and starts three instances of your desired application--updating
|
||||
the status to match your spec. If any of those instances should fail
|
||||
(a status change), the Kubernetes system responds to the difference
|
||||
between spec and status by making a correction--in this case, starting
|
||||
a replacement instance.
|
||||
|
||||
For more information on the object spec, status, and metadata, see the [Kubernetes API Conventions](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md).
|
||||
|
||||
### Describing a Kubernetes Object
|
||||
### Describing a Kubernetes object
|
||||
|
||||
When you create an object in Kubernetes, you must provide the object spec that describes its desired state, as well as some basic information about the object (such as a name). When you use the Kubernetes API to create the object (either directly or via `kubectl`), that API request must include that information as JSON in the request body. **Most often, you provide the information to `kubectl` in a .yaml file.** `kubectl` converts the information to JSON when making the API request.
|
||||
|
||||
|
@ -51,7 +68,7 @@ kubectl apply -f https://k8s.io/examples/application/deployment.yaml --record
|
|||
|
||||
The output is similar to this:
|
||||
|
||||
```shell
|
||||
```
|
||||
deployment.apps/nginx-deployment created
|
||||
```
|
||||
|
||||
|
@ -65,14 +82,15 @@ In the `.yaml` file for the Kubernetes object you want to create, you'll need to
|
|||
* `spec` - What state you desire for the object
|
||||
|
||||
The precise format of the object `spec` is different for every Kubernetes object, and contains nested fields specific to that object. The [Kubernetes API Reference](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/) can help you find the spec format for all of the objects you can create using Kubernetes.
|
||||
For example, the `spec` format for a `Pod` can be found
|
||||
[here](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podspec-v1-core),
|
||||
and the `spec` format for a `Deployment` can be found
|
||||
[here](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#deploymentspec-v1-apps).
|
||||
For example, the `spec` format for a Pod can be found in
|
||||
[PodSpec v1 core](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podspec-v1-core),
|
||||
and the `spec` format for a Deployment can be found in
|
||||
[DeploymentSpec v1 apps](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#deploymentspec-v1-apps).
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture whatsnext %}}
|
||||
* [Kubernetes API overview](/docs/reference/using-api/api-overview/) explains some more API concepts
|
||||
* Learn about the most important basic Kubernetes objects, such as [Pod](/docs/concepts/workloads/pods/pod-overview/).
|
||||
* Learn about [controllers](/docs/concepts/architecture/controller/) in Kubernetes
|
||||
{{% /capture %}}
|
||||
|
|
|
@ -69,10 +69,10 @@ metadata:
|
|||
spec:
|
||||
containers:
|
||||
- name: nginx
|
||||
image: nginx:1.7.9
|
||||
image: nginx:1.14.2
|
||||
ports:
|
||||
- containerPort: 80
|
||||
|
||||
|
||||
```
|
||||
|
||||
## Label selectors
|
||||
|
@ -92,7 +92,7 @@ them.
|
|||
For some API types, such as ReplicaSets, the label selectors of two instances must not overlap within a namespace, or the controller can see that as conflicting instructions and fail to determine how many replicas should be present.
|
||||
{{< /note >}}
|
||||
|
||||
{{< caution >}}
|
||||
{{< caution >}}
|
||||
For both equality-based and set-based conditions there is no logical _OR_ (`||`) operator. Ensure your filter statements are structured accordingly.
|
||||
{{< /caution >}}
|
||||
|
||||
|
@ -210,7 +210,7 @@ this selector (respectively in `json` or `yaml` format) is equivalent to `compon
|
|||
|
||||
#### Resources that support set-based requirements
|
||||
|
||||
Newer resources, such as [`Job`](/docs/concepts/jobs/run-to-completion-finite-workloads/), [`Deployment`](/docs/concepts/workloads/controllers/deployment/), [`Replica Set`](/docs/concepts/workloads/controllers/replicaset/), and [`Daemon Set`](/docs/concepts/workloads/controllers/daemonset/), support _set-based_ requirements as well.
|
||||
Newer resources, such as [`Job`](/docs/concepts/workloads/controllers/jobs-run-to-completion/), [`Deployment`](/docs/concepts/workloads/controllers/deployment/), [`ReplicaSet`](/docs/concepts/workloads/controllers/replicaset/), and [`DaemonSet`](/docs/concepts/workloads/controllers/daemonset/), support _set-based_ requirements as well.
|
||||
|
||||
```yaml
|
||||
selector:
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
reviewers:
|
||||
- mikedanese
|
||||
- thockin
|
||||
title: Names
|
||||
title: Object Names and IDs
|
||||
content_template: templates/concept
|
||||
weight: 20
|
||||
---
|
||||
|
@ -18,14 +18,41 @@ For non-unique user-provided attributes, Kubernetes provides [labels](/docs/conc
|
|||
|
||||
{{% /capture %}}
|
||||
|
||||
|
||||
{{% capture body %}}
|
||||
|
||||
## Names
|
||||
|
||||
{{< glossary_definition term_id="name" length="all" >}}
|
||||
|
||||
Kubernetes resources can have names up to 253 characters long. The characters allowed in names are: digits (0-9), lower case letters (a-z), `-`, and `.`.
|
||||
Below are three types of commonly used name constraints for resources.
|
||||
|
||||
### DNS Subdomain Names
|
||||
|
||||
Most resource types require a name that can be used as a DNS subdomain name
|
||||
as defined in [RFC 1123](https://tools.ietf.org/html/rfc1123).
|
||||
This means the name must:
|
||||
|
||||
- contain no more than 253 characters
|
||||
- contain only lowercase alphanumeric characters, '-' or '.'
|
||||
- start with an alphanumeric character
|
||||
- end with an alphanumeric character
|
||||
|
||||
### DNS Label Names
|
||||
|
||||
Some resource types require their names to follow the DNS
|
||||
label standard as defined in [RFC 1123](https://tools.ietf.org/html/rfc1123).
|
||||
This means the name must:
|
||||
|
||||
- contain at most 63 characters
|
||||
- contain only lowercase alphanumeric characters or '-'
|
||||
- start with an alphanumeric character
|
||||
- end with an alphanumeric character
|
||||
|
||||
### Path Segment Names
|
||||
|
||||
Some resource types require their names to be able to be safely encoded as a
|
||||
path segment. In other words, the name may not be "." or ".." and the name may
|
||||
not contain "/" or "%".
|
||||
|
||||
Here’s an example manifest for a Pod named `nginx-demo`.
|
||||
|
||||
|
@ -37,11 +64,12 @@ metadata:
|
|||
spec:
|
||||
containers:
|
||||
- name: nginx
|
||||
image: nginx:1.7.9
|
||||
image: nginx:1.14.2
|
||||
ports:
|
||||
- containerPort: 80
|
||||
```
|
||||
|
||||
|
||||
{{< note >}}
|
||||
Some resource types have additional restrictions on their names.
|
||||
{{< /note >}}
|
||||
|
|
|
@ -9,56 +9,58 @@ weight: 10
|
|||
{{% capture overview %}}
|
||||
|
||||
By default, containers run with unbounded [compute resources](/docs/user-guide/compute-resources) on a Kubernetes cluster.
|
||||
With Resource quotas, cluster administrators can restrict the resource consumption and creation on a namespace basis.
|
||||
Within a namespace, a Pod or Container can consume as much CPU and memory as defined by the namespace's resource quota. There is a concern that one Pod or Container could monopolize all of the resources. Limit Range is a policy to constrain resource by Pod or Container in a namespace.
|
||||
With resource quotas, cluster administrators can restrict resource consumption and creation on a namespace basis.
|
||||
Within a namespace, a Pod or Container can consume as much CPU and memory as defined by the namespace's resource quota. There is a concern that one Pod or Container could monopolize all available resources. A LimitRange is a policy to constrain resource allocations (to Pods or Containers) in a namespace.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
|
||||
{{% capture body %}}
|
||||
|
||||
A limit range, defined by a `LimitRange` object, provides constraints that can:
|
||||
A _LimitRange_ provides constraints that can:
|
||||
|
||||
- Enforce minimum and maximum compute resources usage per Pod or Container in a namespace.
|
||||
- Enforce minimum and maximum storage request per PersistentVolumeClaim in a namespace.
|
||||
- Enforce a ratio between request and limit for a resource in a namespace.
|
||||
- Set default request/limit for compute resources in a namespace and automatically inject them to Containers at runtime.
|
||||
|
||||
## Enabling Limit Range
|
||||
## Enabling LimitRange
|
||||
|
||||
Limit Range support is enabled by default for many Kubernetes distributions. It is
|
||||
LimitRange support is enabled by default for many Kubernetes distributions. It is
|
||||
enabled when the apiserver `--enable-admission-plugins=` flag has `LimitRanger` admission controller as
|
||||
one of its arguments.
|
||||
|
||||
A limit range is enforced in a particular namespace when there is a
|
||||
`LimitRange` object in that namespace.
|
||||
A LimitRange is enforced in a particular namespace when there is a
|
||||
LimitRange object in that namespace.
|
||||
|
||||
### Overview of Limit Range:
|
||||
The name of a LimitRange object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
### Overview of Limit Range
|
||||
|
||||
- The administrator creates one `LimitRange` in one namespace.
|
||||
- Users create resources like Pods, Containers, and PersistentVolumeClaims in the namespace.
|
||||
- The `LimitRanger` admission controller enforces defaults limits for all Pods and Container that do not set compute resource requirements and tracks usage to ensure it does not exceed resource minimum , maximum and ratio defined in any `LimitRange` present in the namespace.
|
||||
- If creating or updating a resource (Pod, Container, PersistentVolumeClaim) violates a limit range constraint, the request to the API server will fail with HTTP status code `403 FORBIDDEN` and a message explaining the constraint that would have been violated.
|
||||
- If limit range is activated in a namespace for compute resources like `cpu` and `memory`, users must specify
|
||||
requests or limits for those values; otherwise, the system may reject pod creation.
|
||||
- LimitRange validations occurs only at Pod Admission stage, not on Running pods.
|
||||
|
||||
- The `LimitRanger` admission controller enforces defaults and limits for all Pods and Containers that do not set compute resource requirements and tracks usage to ensure it does not exceed resource minimum, maximum and ratio defined in any LimitRange present in the namespace.
|
||||
- If creating or updating a resource (Pod, Container, PersistentVolumeClaim) that violates a LimitRange constraint, the request to the API server will fail with an HTTP status code `403 FORBIDDEN` and a message explaining the constraint that have been violated.
|
||||
- If a LimitRange is activated in a namespace for compute resources like `cpu` and `memory`, users must specify
|
||||
requests or limits for those values. Otherwise, the system may reject Pod creation.
|
||||
- LimitRange validations occurs only at Pod Admission stage, not on Running Pods.
|
||||
|
||||
Examples of policies that could be created using limit range are:
|
||||
|
||||
- In a 2 node cluster with a capacity of 8 GiB RAM, and 16 cores, constrain Pods in a namespace to request 100m and not exceeds 500m for CPU , request 200Mi and not exceed 600Mi
|
||||
- Define default CPU limits and request to 150m and Memory default request to 300Mi for containers started with no cpu and memory requests in their spec.
|
||||
- In a 2 node cluster with a capacity of 8 GiB RAM and 16 cores, constrain Pods in a namespace to request 100m of CPU with a max limit of 500m for CPU and request 200Mi for Memory with a max limit of 600Mi for Memory.
|
||||
- Define default CPU limit and request to 150m and memory default request to 300Mi for Containers started with no cpu and memory requests in their specs.
|
||||
|
||||
In the case where the total limits of the namespace is less than the sum of the limits of the Pods/Containers,
|
||||
there may be contention for resources; The Containers or Pods will not be created.
|
||||
there may be contention for resources. In this case, the Containers or Pods will not be created.
|
||||
|
||||
Neither contention nor changes to limitrange will affect already created resources.
|
||||
Neither contention nor changes to a LimitRange will affect already created resources.
|
||||
|
||||
## Limiting Container compute resources
|
||||
|
||||
The following section discusses the creation of a LimitRange acting at Container Level.
|
||||
A Pod with 04 containers is first created; each container within the Pod has a specific `spec.resource` configuration
|
||||
each container within the pod is handled differently by the LimitRanger admission controller.
|
||||
A Pod with 04 Containers is first created. Each Container within the Pod has a specific `spec.resource` configuration.
|
||||
Each Container within the Pod is handled differently by the `LimitRanger` admission controller.
|
||||
|
||||
Create a namespace `limitrange-demo` using the following kubectl command:
|
||||
|
||||
|
@ -75,16 +77,16 @@ kubectl config set-context --current --namespace=limitrange-demo
|
|||
Here is the configuration file for a LimitRange object:
|
||||
{{< codenew file="admin/resource/limit-mem-cpu-container.yaml" >}}
|
||||
|
||||
This object defines minimum and maximum Memory/CPU limits, default cpu/Memory requests and default limits for CPU/Memory resources to be apply to containers.
|
||||
This object defines minimum and maximum CPU/Memory limits, default CPU/Memory requests, and default limits for CPU/Memory resources to be apply to containers.
|
||||
|
||||
Create the `limit-mem-cpu-per-container` LimitRange in the `limitrange-demo` namespace with the following kubectl command:
|
||||
Create the `limit-mem-cpu-per-container` LimitRange with the following kubectl command:
|
||||
|
||||
```shell
|
||||
kubectl create -f https://k8s.io/examples/admin/resource/limit-mem-cpu-container.yaml -n limitrange-demo
|
||||
kubectl create -f https://k8s.io/examples/admin/resource/limit-mem-cpu-container.yaml
|
||||
```
|
||||
|
||||
```shell
|
||||
kubectl describe limitrange/limit-mem-cpu-per-container -n limitrange-demo
|
||||
kubectl describe limitrange/limit-mem-cpu-per-container
|
||||
```
|
||||
|
||||
```shell
|
||||
|
@ -94,13 +96,13 @@ Container cpu 100m 800m 110m 700m -
|
|||
Container memory 99Mi 1Gi 111Mi 900Mi -
|
||||
```
|
||||
|
||||
Here is the configuration file for a Pod with 04 containers to demonstrate LimitRange features :
|
||||
Here is the configuration file for a Pod with 04 Containers to demonstrate LimitRange features:
|
||||
{{< codenew file="admin/resource/limit-range-pod-1.yaml" >}}
|
||||
|
||||
Create the `busybox1` Pod:
|
||||
|
||||
```shell
|
||||
kubectl apply -f https://k8s.io/examples/admin/resource/limit-range-pod-1.yaml -n limitrange-demo
|
||||
kubectl apply -f https://k8s.io/examples/admin/resource/limit-range-pod-1.yaml
|
||||
```
|
||||
|
||||
### Container spec with valid CPU/Memory requests and limits
|
||||
|
@ -108,7 +110,7 @@ kubectl apply -f https://k8s.io/examples/admin/resource/limit-range-pod-1.yaml -
|
|||
View the `busybox-cnt01` resource configuration:
|
||||
|
||||
```shell
|
||||
kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[0].resources"
|
||||
kubectl get po/busybox1 -o json | jq ".spec.containers[0].resources"
|
||||
```
|
||||
|
||||
```json
|
||||
|
@ -125,9 +127,9 @@ kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[0].res
|
|||
```
|
||||
|
||||
- The `busybox-cnt01` Container inside `busybox` Pod defined `requests.cpu=100m` and `requests.memory=100Mi`.
|
||||
- `100m <= 500m <= 800m` , The container cpu limit (500m) falls inside the authorized CPU limit range.
|
||||
- `99Mi <= 200Mi <= 1Gi` , The container memory limit (200Mi) falls inside the authorized Memory limit range.
|
||||
- No request/limits ratio validation for CPU/Memory , thus the container is valid and created.
|
||||
- `100m <= 500m <= 800m` , The Container cpu limit (500m) falls inside the authorized CPU LimitRange.
|
||||
- `99Mi <= 200Mi <= 1Gi` , The Container memory limit (200Mi) falls inside the authorized Memory LimitRange.
|
||||
- No request/limits ratio validation for CPU/Memory, so the Container is valid and created.
|
||||
|
||||
|
||||
### Container spec with a valid CPU/Memory requests but no limits
|
||||
|
@ -135,7 +137,7 @@ kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[0].res
|
|||
View the `busybox-cnt02` resource configuration
|
||||
|
||||
```shell
|
||||
kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[1].resources"
|
||||
kubectl get po/busybox1 -o json | jq ".spec.containers[1].resources"
|
||||
```
|
||||
|
||||
```json
|
||||
|
@ -151,17 +153,18 @@ kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[1].res
|
|||
}
|
||||
```
|
||||
- The `busybox-cnt02` Container inside `busybox1` Pod defined `requests.cpu=100m` and `requests.memory=100Mi` but not limits for cpu and memory.
|
||||
- The container do not have a limits section, the default limits defined in the limit-mem-cpu-per-container LimitRange object are injected to this container `limits.cpu=700mi` and `limits.memory=900Mi`.
|
||||
- `100m <= 700m <= 800m` , The container cpu limit (700m) falls inside the authorized CPU limit range.
|
||||
- `99Mi <= 900Mi <= 1Gi` , The container memory limit (900Mi) falls inside the authorized Memory limit range.
|
||||
- No request/limits ratio set , thus the container is valid and created.
|
||||
- The Container does not have a limits section. The default limits defined in the `limit-mem-cpu-per-container` LimitRange object are injected in to this Container: `limits.cpu=700mi` and `limits.memory=900Mi`.
|
||||
- `100m <= 700m <= 800m` , The Container cpu limit (700m) falls inside the authorized CPU limit range.
|
||||
- `99Mi <= 900Mi <= 1Gi` , The Container memory limit (900Mi) falls inside the authorized Memory limit range.
|
||||
- No request/limits ratio set, so the Container is valid and created.
|
||||
|
||||
|
||||
### Container spec with a valid CPU/Memory limits but no requests
|
||||
View the `busybox-cnt03` resource configuration
|
||||
### Container spec with a valid CPU/Memory limits but no requests
|
||||
|
||||
View the `busybox-cnt03` resource configuration:
|
||||
|
||||
```shell
|
||||
kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[2].resources"
|
||||
kubectl get po/busybox1 -o json | jq ".spec.containers[2].resources"
|
||||
```
|
||||
```json
|
||||
{
|
||||
|
@ -177,17 +180,17 @@ kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[2].res
|
|||
```
|
||||
|
||||
- The `busybox-cnt03` Container inside `busybox1` Pod defined `limits.cpu=500m` and `limits.memory=200Mi` but no `requests` for cpu and memory.
|
||||
- The container do not define a request section, the defaultRequest defined in the limit-mem-cpu-per-container LimitRange is not used to fill its limits section but the limits defined by the container are set as requests `limits.cpu=500m` and `limits.memory=200Mi`.
|
||||
- `100m <= 500m <= 800m` , The container cpu limit (500m) falls inside the authorized CPU limit range.
|
||||
- `99Mi <= 200Mi <= 1Gi` , The container memory limit (200Mi) falls inside the authorized Memory limit range.
|
||||
- No request/limits ratio set , thus the container is valid and created.
|
||||
- The Container does not define a request section. The default request defined in the limit-mem-cpu-per-container LimitRange is not used to fill its limits section, but the limits defined by the Container are set as requests `limits.cpu=500m` and `limits.memory=200Mi`.
|
||||
- `100m <= 500m <= 800m` , The Container cpu limit (500m) falls inside the authorized CPU limit range.
|
||||
- `99Mi <= 200Mi <= 1Gi` , The Container memory limit (200Mi) falls inside the authorized Memory limit range.
|
||||
- No request/limits ratio set, so the Container is valid and created.
|
||||
|
||||
### Container spec with no CPU/Memory requests/limits
|
||||
|
||||
View the `busybox-cnt04` resource configuration:
|
||||
|
||||
```shell
|
||||
kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[3].resources"
|
||||
kubectl get po/busybox1 -o json | jq ".spec.containers[3].resources"
|
||||
```
|
||||
|
||||
```json
|
||||
|
@ -204,27 +207,27 @@ kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[3].res
|
|||
```
|
||||
|
||||
- The `busybox-cnt04` Container inside `busybox1` define neither `limits` nor `requests`.
|
||||
- The container do not define a limit section, the default limit defined in the limit-mem-cpu-per-container LimitRange is used to fill its request
|
||||
- The Container do not define a limit section, the default limit defined in the limit-mem-cpu-per-container LimitRange is used to fill its request
|
||||
`limits.cpu=700m and` `limits.memory=900Mi` .
|
||||
- The container do not define a request section, the defaultRequest defined in the limit-mem-cpu-per-container LimitRange is used to fill its request section requests.cpu=110m and requests.memory=111Mi
|
||||
- `100m <= 700m <= 800m` , The container cpu limit (700m) falls inside the authorized CPU limit range.
|
||||
- `99Mi <= 900Mi <= 1Gi` , The container memory limit (900Mi) falls inside the authorized Memory limitrange .
|
||||
- No request/limits ratio set , thus the container is valid and created.
|
||||
- The Container do not define a request section, the defaultRequest defined in the `limit-mem-cpu-per-container` LimitRange is used to fill its request section requests.cpu=110m and requests.memory=111Mi
|
||||
- `100m <= 700m <= 800m` , The Container cpu limit (700m) falls inside the authorized CPU limit range.
|
||||
- `99Mi <= 900Mi <= 1Gi` , The Container memory limit (900Mi) falls inside the authorized Memory limit range .
|
||||
- No request/limits ratio set, so the Container is valid and created.
|
||||
|
||||
All containers defined in the `busybox` Pod passed LimitRange validations, this the Pod is valid and create in the namespace.
|
||||
All Containers defined in the `busybox` Pod passed LimitRange validations, so this the Pod is valid and created in the namespace.
|
||||
|
||||
## Limiting Pod compute resources
|
||||
|
||||
The following section discusses how to constrain resources at Pod level.
|
||||
The following section discusses how to constrain resources at the Pod level.
|
||||
|
||||
{{< codenew file="admin/resource/limit-mem-cpu-pod.yaml" >}}
|
||||
|
||||
Without having to delete `busybox1` Pod, create the `limit-mem-cpu-pod` LimitRange in the `limitrange-demo` namespace:
|
||||
Without having to delete the `busybox1` Pod, create the `limit-mem-cpu-pod` LimitRange in the `limitrange-demo` namespace:
|
||||
|
||||
```shell
|
||||
kubectl apply -f https://k8s.io/examples/admin/resource/limit-mem-cpu-pod.yaml -n limitrange-demo
|
||||
kubectl apply -f https://k8s.io/examples/admin/resource/limit-mem-cpu-pod.yaml
|
||||
```
|
||||
The limitrange is created and limits CPU to 2 Core and Memory to 2Gi per Pod:
|
||||
The LimitRange is created and limits CPU to 2 Core and Memory to 2Gi per Pod:
|
||||
|
||||
```shell
|
||||
limitrange/limit-mem-cpu-per-pod created
|
||||
|
@ -250,36 +253,36 @@ Now create the `busybox2` Pod:
|
|||
{{< codenew file="admin/resource/limit-range-pod-2.yaml" >}}
|
||||
|
||||
```shell
|
||||
kubectl apply -f https://k8s.io/examples/admin/resource/limit-range-pod-2.yaml -n limitrange-demo
|
||||
kubectl apply -f https://k8s.io/examples/admin/resource/limit-range-pod-2.yaml
|
||||
```
|
||||
|
||||
The `busybox2` Pod definition is identical to `busybox1` but an error is reported since Pod's resources are now limited:
|
||||
The `busybox2` Pod definition is identical to `busybox1`, but an error is reported since the Pod's resources are now limited:
|
||||
|
||||
```shell
|
||||
Error from server (Forbidden): error when creating "limit-range-pod-2.yaml": pods "busybox2" is forbidden: [maximum cpu usage per Pod is 2, but limit is 2400m., maximum memory usage per Pod is 2Gi, but limit is 2306867200.]
|
||||
```
|
||||
|
||||
```shell
|
||||
kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[].resources.limits.memory"
|
||||
kubectl get po/busybox1 -o json | jq ".spec.containers[].resources.limits.memory"
|
||||
"200Mi"
|
||||
"900Mi"
|
||||
"200Mi"
|
||||
"900Mi"
|
||||
```
|
||||
|
||||
`busybox2` Pod will not be admitted on the cluster since the total memory limit of its container is greater than the limit defined in the LimitRange.
|
||||
`busybox2` Pod will not be admitted on the cluster since the total memory limit of its Container is greater than the limit defined in the LimitRange.
|
||||
`busybox1` will not be evicted since it was created and admitted on the cluster before the LimitRange creation.
|
||||
|
||||
## Limiting Storage resources
|
||||
|
||||
You can enforce minimum and maximum size of [storage resources](/docs/concepts/storage/persistent-volumes/) that can be requested by each PersistentVolumeClaim in a namespace using a LimitRange:
|
||||
You can enforce minimum and maximum size of [storage resources](/docs/concepts/storage/persistent-volumes/) that can be requested by each PersistentVolumeClaim in a namespace using a LimitRange:
|
||||
|
||||
{{< codenew file="admin/resource/storagelimits.yaml" >}}
|
||||
|
||||
Apply the YAML using `kubectl create`:
|
||||
|
||||
```shell
|
||||
kubectl create -f https://k8s.io/examples/admin/resource/storagelimits.yaml -n limitrange-demo
|
||||
kubectl create -f https://k8s.io/examples/admin/resource/storagelimits.yaml
|
||||
```
|
||||
|
||||
```shell
|
||||
|
@ -305,7 +308,7 @@ PersistentVolumeClaim storage 1Gi 2Gi - - -
|
|||
{{< codenew file="admin/resource/pvc-limit-lower.yaml" >}}
|
||||
|
||||
```shell
|
||||
kubectl create -f https://k8s.io/examples/admin/resource/pvc-limit-lower.yaml -n limitrange-demo
|
||||
kubectl create -f https://k8s.io/examples/admin/resource/pvc-limit-lower.yaml
|
||||
```
|
||||
|
||||
While creating a PVC with `requests.storage` lower than the Min value in the LimitRange, an Error thrown by the server:
|
||||
|
@ -319,7 +322,7 @@ Same behaviour is noted if the `requests.storage` is greater than the Max value
|
|||
{{< codenew file="admin/resource/pvc-limit-greater.yaml" >}}
|
||||
|
||||
```shell
|
||||
kubectl create -f https://k8s.io/examples/admin/resource/pvc-limit-greater.yaml -n limitrange-demo
|
||||
kubectl create -f https://k8s.io/examples/admin/resource/pvc-limit-greater.yaml
|
||||
```
|
||||
|
||||
```shell
|
||||
|
@ -328,9 +331,9 @@ Error from server (Forbidden): error when creating "pvc-limit-greater.yaml": per
|
|||
|
||||
## Limits/Requests Ratio
|
||||
|
||||
If `LimitRangeItem.maxLimitRequestRatio` is specified in the `LimitRangeSpec`, the named resource must have a request and limit that are both non-zero where limit divided by request is less than or equal to the enumerated value
|
||||
If `LimitRangeItem.maxLimitRequestRatio` is specified in the `LimitRangeSpec`, the named resource must have a request and limit that are both non-zero where limit divided by request is less than or equal to the enumerated value.
|
||||
|
||||
The following `LimitRange` enforces memory limit to be at most twice the amount of the memory request for any pod in the namespace.
|
||||
The following LimitRange enforces memory limit to be at most twice the amount of the memory request for any Pod in the namespace:
|
||||
|
||||
{{< codenew file="admin/resource/limit-memory-ratio-pod.yaml" >}}
|
||||
|
||||
|
@ -352,7 +355,7 @@ Type Resource Min Max Default Request Default Limit Max Limit/Reques
|
|||
Pod memory - - - - 2
|
||||
```
|
||||
|
||||
Let's create a pod with `requests.memory=100Mi` and `limits.memory=300Mi`:
|
||||
Create a pod with `requests.memory=100Mi` and `limits.memory=300Mi`:
|
||||
|
||||
{{< codenew file="admin/resource/limit-range-pod-3.yaml" >}}
|
||||
|
||||
|
@ -360,19 +363,24 @@ Let's create a pod with `requests.memory=100Mi` and `limits.memory=300Mi`:
|
|||
kubectl apply -f https://k8s.io/examples/admin/resource/limit-range-pod-3.yaml
|
||||
```
|
||||
|
||||
The pod creation failed as the ratio here (`3`) is greater than the enforced limit (`2`) in `limit-memory-ratio-pod` LimitRange
|
||||
The pod creation failed as the ratio here (`3`) is greater than the enforced limit (`2`) in `limit-memory-ratio-pod` LimitRange:
|
||||
|
||||
```shell
|
||||
```
|
||||
Error from server (Forbidden): error when creating "limit-range-pod-3.yaml": pods "busybox3" is forbidden: memory max limit to request ratio per Pod is 2, but provided ratio is 3.000000.
|
||||
```
|
||||
|
||||
### Clean up
|
||||
## Clean up
|
||||
|
||||
Delete the `limitrange-demo` namespace to free all resources:
|
||||
|
||||
```shell
|
||||
kubectl delete ns limitrange-demo
|
||||
```
|
||||
Change your context to `default` namespace with the following command:
|
||||
|
||||
```shell
|
||||
kubectl config set-context --current --namespace=default
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
|
|
|
@ -197,6 +197,8 @@ alias kubectl-user='kubectl --as=system:serviceaccount:psp-example:fake-user -n
|
|||
|
||||
Define the example PodSecurityPolicy object in a file. This is a policy that
|
||||
simply prevents the creation of privileged pods.
|
||||
The name of a PodSecurityPolicy object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
{{< codenew file="policy/example-psp.yaml" >}}
|
||||
|
||||
|
@ -419,8 +421,10 @@ The **recommended minimum set** of allowed volumes for new PSPs are:
|
|||
- projected
|
||||
|
||||
{{< warning >}}
|
||||
PodSecurityPolicy does not limit the types of `PersistentVolume` objects that may be referenced by a `PersistentVolumeClaim`.
|
||||
Only trusted users should be granted permission to create `PersistentVolume` objects.
|
||||
PodSecurityPolicy does not limit the types of `PersistentVolume` objects that
|
||||
may be referenced by a `PersistentVolumeClaim`, and hostPath type
|
||||
`PersistentVolumes` do not support read-only access mode. Only trusted users
|
||||
should be granted permission to create `PersistentVolume` objects.
|
||||
{{< /warning >}}
|
||||
|
||||
**FSGroup** - Controls the supplemental group applied to some volumes.
|
||||
|
|
|
@ -37,6 +37,9 @@ Resource quotas work like this:
|
|||
the `LimitRanger` admission controller to force defaults for pods that make no compute resource requirements.
|
||||
See the [walkthrough](/docs/tasks/administer-cluster/quota-memory-cpu-namespace/) for an example of how to avoid this problem.
|
||||
|
||||
The name of a `ResourceQuota` object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
Examples of policies that could be created using namespaces and quotas are:
|
||||
|
||||
- In a cluster with a capacity of 32 GiB RAM, and 16 cores, let team A use 20 GiB and 10 cores,
|
||||
|
@ -376,7 +379,7 @@ pods 0 10
|
|||
* `Exist`
|
||||
* `DoesNotExist`
|
||||
|
||||
## Requests vs Limits
|
||||
## Requests compared to Limits {#requests-vs-limits}
|
||||
|
||||
When allocating compute resources, each container may specify a request and a limit value for either CPU or memory.
|
||||
The quota can be configured to quota either value.
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
title: Kubernetes Scheduler
|
||||
content_template: templates/concept
|
||||
weight: 60
|
||||
weight: 50
|
||||
---
|
||||
|
||||
{{% capture overview %}}
|
||||
|
@ -54,14 +54,12 @@ individual and collective resource requirements, hardware / software /
|
|||
policy constraints, affinity and anti-affinity specifications, data
|
||||
locality, inter-workload interference, and so on.
|
||||
|
||||
## Scheduling with kube-scheduler {#kube-scheduler-implementation}
|
||||
### Node selection in kube-scheduler {#kube-scheduler-implementation}
|
||||
|
||||
kube-scheduler selects a node for the pod in a 2-step operation:
|
||||
|
||||
1. Filtering
|
||||
|
||||
2. Scoring
|
||||
|
||||
1. Scoring
|
||||
|
||||
The _filtering_ step finds the set of Nodes where it's feasible to
|
||||
schedule the Pod. For example, the PodFitsResources filter checks whether a
|
||||
|
@ -78,105 +76,15 @@ Finally, kube-scheduler assigns the Pod to the Node with the highest ranking.
|
|||
If there is more than one node with equal scores, kube-scheduler selects
|
||||
one of these at random.
|
||||
|
||||
There are two supported ways to configure the filtering and scoring behavior
|
||||
of the scheduler:
|
||||
|
||||
### Default policies
|
||||
|
||||
kube-scheduler has a default set of scheduling policies.
|
||||
|
||||
### Filtering
|
||||
|
||||
- `PodFitsHostPorts`: Checks if a Node has free ports (the network protocol kind)
|
||||
for the Pod ports the Pod is requesting.
|
||||
|
||||
- `PodFitsHost`: Checks if a Pod specifies a specific Node by its hostname.
|
||||
|
||||
- `PodFitsResources`: Checks if the Node has free resources (eg, CPU and Memory)
|
||||
to meet the requirement of the Pod.
|
||||
|
||||
- `PodMatchNodeSelector`: Checks if a Pod's Node {{< glossary_tooltip term_id="selector" >}}
|
||||
matches the Node's {{< glossary_tooltip text="label(s)" term_id="label" >}}.
|
||||
|
||||
- `NoVolumeZoneConflict`: Evaluate if the {{< glossary_tooltip text="Volumes" term_id="volume" >}}
|
||||
that a Pod requests are available on the Node, given the failure zone restrictions for
|
||||
that storage.
|
||||
|
||||
- `NoDiskConflict`: Evaluates if a Pod can fit on a Node due to the volumes it requests,
|
||||
and those that are already mounted.
|
||||
|
||||
- `MaxCSIVolumeCount`: Decides how many {{< glossary_tooltip text="CSI" term_id="csi" >}}
|
||||
volumes should be attached, and whether that's over a configured limit.
|
||||
|
||||
- `CheckNodeMemoryPressure`: If a Node is reporting memory pressure, and there's no
|
||||
configured exception, the Pod won't be scheduled there.
|
||||
|
||||
- `CheckNodePIDPressure`: If a Node is reporting that process IDs are scarce, and
|
||||
there's no configured exception, the Pod won't be scheduled there.
|
||||
|
||||
- `CheckNodeDiskPressure`: If a Node is reporting storage pressure (a filesystem that
|
||||
is full or nearly full), and there's no configured exception, the Pod won't be
|
||||
scheduled there.
|
||||
|
||||
- `CheckNodeCondition`: Nodes can report that they have a completely full filesystem,
|
||||
that networking isn't available or that kubelet is otherwise not ready to run Pods.
|
||||
If such a condition is set for a Node, and there's no configured exception, the Pod
|
||||
won't be scheduled there.
|
||||
|
||||
- `PodToleratesNodeTaints`: checks if a Pod's {{< glossary_tooltip text="tolerations" term_id="toleration" >}}
|
||||
can tolerate the Node's {{< glossary_tooltip text="taints" term_id="taint" >}}.
|
||||
|
||||
- `CheckVolumeBinding`: Evaluates if a Pod can fit due to the volumes it requests.
|
||||
This applies for both bound and unbound
|
||||
{{< glossary_tooltip text="PVCs" term_id="persistent-volume-claim" >}}.
|
||||
|
||||
### Scoring
|
||||
|
||||
- `SelectorSpreadPriority`: Spreads Pods across hosts, considering Pods that
|
||||
belong to the same {{< glossary_tooltip text="Service" term_id="service" >}},
|
||||
{{< glossary_tooltip term_id="statefulset" >}} or
|
||||
{{< glossary_tooltip term_id="replica-set" >}}.
|
||||
|
||||
- `InterPodAffinityPriority`: Computes a sum by iterating through the elements
|
||||
of weightedPodAffinityTerm and adding “weight” to the sum if the corresponding
|
||||
PodAffinityTerm is satisfied for that node; the node(s) with the highest sum
|
||||
are the most preferred.
|
||||
|
||||
- `LeastRequestedPriority`: Favors nodes with fewer requested resources. In other
|
||||
words, the more Pods that are placed on a Node, and the more resources those
|
||||
Pods use, the lower the ranking this policy will give.
|
||||
|
||||
- `MostRequestedPriority`: Favors nodes with most requested resources. This policy
|
||||
will fit the scheduled Pods onto the smallest number of Nodes needed to run your
|
||||
overall set of workloads.
|
||||
|
||||
- `RequestedToCapacityRatioPriority`: Creates a requestedToCapacity based ResourceAllocationPriority using default resource scoring function shape.
|
||||
|
||||
- `BalancedResourceAllocation`: Favors nodes with balanced resource usage.
|
||||
|
||||
- `NodePreferAvoidPodsPriority`: Prioritizes nodes according to the node annotation
|
||||
`scheduler.alpha.kubernetes.io/preferAvoidPods`. You can use this to hint that
|
||||
two different Pods shouldn't run on the same Node.
|
||||
|
||||
- `NodeAffinityPriority`: Prioritizes nodes according to node affinity scheduling
|
||||
preferences indicated in PreferredDuringSchedulingIgnoredDuringExecution.
|
||||
You can read more about this in [Assigning Pods to Nodes](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/).
|
||||
|
||||
- `TaintTolerationPriority`: Prepares the priority list for all the nodes, based on
|
||||
the number of intolerable taints on the node. This policy adjusts a node's rank
|
||||
taking that list into account.
|
||||
|
||||
- `ImageLocalityPriority`: Favors nodes that already have the
|
||||
{{< glossary_tooltip text="container images" term_id="image" >}} for that
|
||||
Pod cached locally.
|
||||
|
||||
- `ServiceSpreadingPriority`: For a given Service, this policy aims to make sure that
|
||||
the Pods for the Service run on different nodes. It favours scheduling onto nodes
|
||||
that don't have Pods for the service already assigned there. The overall outcome is
|
||||
that the Service becomes more resilient to a single Node failure.
|
||||
|
||||
- `CalculateAntiAffinityPriorityMap`: This policy helps implement
|
||||
[pod anti-affinity](/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity).
|
||||
|
||||
- `EqualPriorityMap`: Gives an equal weight of one to all nodes.
|
||||
1. [Scheduling Policies](/docs/reference/scheduling/policies) allow you to
|
||||
configure _Predicates_ for filtering and _Priorities_ for scoring.
|
||||
1. [Scheduling Profiles](/docs/reference/scheduling/profiles) allow you to
|
||||
configure Plugins that implement different scheduling stages, including:
|
||||
`QueueSort`, `Filter`, `Score`, `Bind`, `Reserve`, `Permit`, and others. You
|
||||
can also configure the kube-scheduler to run different profiles.
|
||||
|
||||
{{% /capture %}}
|
||||
{{% capture whatsnext %}}
|
||||
|
|
|
@ -3,14 +3,14 @@ reviewers:
|
|||
- ahg-g
|
||||
title: Scheduling Framework
|
||||
content_template: templates/concept
|
||||
weight: 70
|
||||
weight: 60
|
||||
---
|
||||
|
||||
{{% capture overview %}}
|
||||
|
||||
{{< feature-state for_k8s_version="1.15" state="alpha" >}}
|
||||
|
||||
The scheduling framework is a new pluggable architecture for Kubernetes Scheduler
|
||||
The scheduling framework is a pluggable architecture for Kubernetes Scheduler
|
||||
that makes scheduler customizations easy. It adds a new set of "plugin" APIs to
|
||||
the existing scheduler. Plugins are compiled into the scheduler. The APIs
|
||||
allow most scheduling features to be implemented as plugins, while keeping the
|
||||
|
@ -56,16 +56,16 @@ stateful tasks.
|
|||
|
||||
{{< figure src="/images/docs/scheduling-framework-extensions.png" title="scheduling framework extension points" >}}
|
||||
|
||||
### Queue sort
|
||||
### QueueSort {#queue-sort}
|
||||
|
||||
These plugins are used to sort Pods in the scheduling queue. A queue sort plugin
|
||||
essentially will provide a "less(Pod1, Pod2)" function. Only one queue sort
|
||||
essentially provides a `Less(Pod1, Pod2)` function. Only one queue sort
|
||||
plugin may be enabled at a time.
|
||||
|
||||
### Pre-filter
|
||||
### PreFilter {#pre-filter}
|
||||
|
||||
These plugins are used to pre-process info about the Pod, or to check certain
|
||||
conditions that the cluster or the Pod must meet. If a pre-filter plugin returns
|
||||
conditions that the cluster or the Pod must meet. If a PreFilter plugin returns
|
||||
an error, the scheduling cycle is aborted.
|
||||
|
||||
### Filter
|
||||
|
@ -75,28 +75,25 @@ node, the scheduler will call filter plugins in their configured order. If any
|
|||
filter plugin marks the node as infeasible, the remaining plugins will not be
|
||||
called for that node. Nodes may be evaluated concurrently.
|
||||
|
||||
### Post-filter
|
||||
### PreScore {#pre-score}
|
||||
|
||||
This is an informational extension point. Plugins will be called with a list of
|
||||
nodes that passed the filtering phase. A plugin may use this data to update
|
||||
internal state or to generate logs/metrics.
|
||||
These plugins are used to perform "pre-scoring" work, which generates a sharable
|
||||
state for Score plugins to use. If a PreScore plugin returns an error, the
|
||||
scheduling cycle is aborted.
|
||||
|
||||
**Note:** Plugins wishing to perform "pre-scoring" work should use the
|
||||
post-filter extension point.
|
||||
|
||||
### Scoring
|
||||
### Score {#scoring}
|
||||
|
||||
These plugins are used to rank nodes that have passed the filtering phase. The
|
||||
scheduler will call each scoring plugin for each node. There will be a well
|
||||
defined range of integers representing the minimum and maximum scores. After the
|
||||
[normalize scoring](#normalize-scoring) phase, the scheduler will combine node
|
||||
[NormalizeScore](#normalize-scoring) phase, the scheduler will combine node
|
||||
scores from all plugins according to the configured plugin weights.
|
||||
|
||||
### Normalize scoring
|
||||
### NormalizeScore {#normalize-scoring}
|
||||
|
||||
These plugins are used to modify scores before the scheduler computes a final
|
||||
ranking of Nodes. A plugin that registers for this extension point will be
|
||||
called with the [scoring](#scoring) results from the same plugin. This is called
|
||||
called with the [Score](#scoring) results from the same plugin. This is called
|
||||
once per plugin per scheduling cycle.
|
||||
|
||||
For example, suppose a plugin `BlinkingLightScorer` ranks Nodes based on how
|
||||
|
@ -104,7 +101,7 @@ many blinking lights they have.
|
|||
|
||||
```go
|
||||
func ScoreNode(_ *v1.pod, n *v1.Node) (int, error) {
|
||||
return getBlinkingLightCount(n)
|
||||
return getBlinkingLightCount(n)
|
||||
}
|
||||
```
|
||||
|
||||
|
@ -114,21 +111,23 @@ extension point.
|
|||
|
||||
```go
|
||||
func NormalizeScores(scores map[string]int) {
|
||||
highest := 0
|
||||
for _, score := range scores {
|
||||
highest = max(highest, score)
|
||||
}
|
||||
for node, score := range scores {
|
||||
scores[node] = score*NodeScoreMax/highest
|
||||
}
|
||||
highest := 0
|
||||
for _, score := range scores {
|
||||
highest = max(highest, score)
|
||||
}
|
||||
for node, score := range scores {
|
||||
scores[node] = score*NodeScoreMax/highest
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
If any normalize-scoring plugin returns an error, the scheduling cycle is
|
||||
If any NormalizeScore plugin returns an error, the scheduling cycle is
|
||||
aborted.
|
||||
|
||||
**Note:** Plugins wishing to perform "pre-reserve" work should use the
|
||||
normalize-scoring extension point.
|
||||
{{< note >}}
|
||||
Plugins wishing to perform "pre-reserve" work should use the
|
||||
NormalizeScore extension point.
|
||||
{{< /note >}}
|
||||
|
||||
### Reserve
|
||||
|
||||
|
@ -140,53 +139,53 @@ to prevent race conditions while the scheduler waits for the bind to succeed.
|
|||
|
||||
This is the last step in a scheduling cycle. Once a Pod is in the reserved
|
||||
state, it will either trigger [Unreserve](#unreserve) plugins (on failure) or
|
||||
[Post-bind](#post-bind) plugins (on success) at the end of the binding cycle.
|
||||
|
||||
*Note: This concept used to be referred to as "assume".*
|
||||
[PostBind](#post-bind) plugins (on success) at the end of the binding cycle.
|
||||
|
||||
### Permit
|
||||
|
||||
These plugins are used to prevent or delay the binding of a Pod. A permit plugin
|
||||
can do one of three things.
|
||||
_Permit_ plugins are invoked at the end of the scheduling cycle for each Pod, to
|
||||
prevent or delay the binding to the candidate node. A permit plugin can do one of
|
||||
the three things:
|
||||
|
||||
1. **approve** \
|
||||
Once all permit plugins approve a Pod, it is sent for binding.
|
||||
Once all Permit plugins approve a Pod, it is sent for binding.
|
||||
|
||||
1. **deny** \
|
||||
If any permit plugin denies a Pod, it is returned to the scheduling queue.
|
||||
If any Permit plugin denies a Pod, it is returned to the scheduling queue.
|
||||
This will trigger [Unreserve](#unreserve) plugins.
|
||||
|
||||
1. **wait** (with a timeout) \
|
||||
If a permit plugin returns "wait", then the Pod is kept in the permit phase
|
||||
until a [plugin approves it](#frameworkhandle). If a timeout occurs, **wait**
|
||||
becomes **deny** and the Pod is returned to the scheduling queue, triggering
|
||||
[Unreserve](#unreserve) plugins.
|
||||
If a Permit plugin returns "wait", then the Pod is kept in an internal "waiting"
|
||||
Pods list, and the binding cycle of this Pod starts but directly blocks until it
|
||||
gets [approved](#frameworkhandle). If a timeout occurs, **wait** becomes **deny**
|
||||
and the Pod is returned to the scheduling queue, triggering [Unreserve](#unreserve)
|
||||
plugins.
|
||||
|
||||
**Approving a Pod binding**
|
||||
{{< note >}}
|
||||
While any plugin can access the list of "waiting" Pods and approve them
|
||||
(see [`FrameworkHandle`](#frameworkhandle)), we expect only the permit
|
||||
plugins to approve binding of reserved Pods that are in "waiting" state. Once a Pod
|
||||
is approved, it is sent to the [PreBind](#pre-bind) phase.
|
||||
{{< /note >}}
|
||||
|
||||
While any plugin can access the list of "waiting" Pods from the cache and
|
||||
approve them (see [`FrameworkHandle`](#frameworkhandle)) we expect only the permit
|
||||
plugins to approve binding of reserved Pods that are in "waiting" state. Once a
|
||||
Pod is approved, it is sent to the pre-bind phase.
|
||||
|
||||
### Pre-bind
|
||||
### PreBind {#pre-bind}
|
||||
|
||||
These plugins are used to perform any work required before a Pod is bound. For
|
||||
example, a pre-bind plugin may provision a network volume and mount it on the
|
||||
target node before allowing the Pod to run there.
|
||||
|
||||
If any pre-bind plugin returns an error, the Pod is [rejected](#unreserve) and
|
||||
If any PreBind plugin returns an error, the Pod is [rejected](#unreserve) and
|
||||
returned to the scheduling queue.
|
||||
|
||||
### Bind
|
||||
|
||||
These plugins are used to bind a Pod to a Node. Bind plugins will not be called
|
||||
until all pre-bind plugins have completed. Each bind plugin is called in the
|
||||
until all PreBind plugins have completed. Each bind plugin is called in the
|
||||
configured order. A bind plugin may choose whether or not to handle the given
|
||||
Pod. If a bind plugin chooses to handle a Pod, **the remaining bind plugins are
|
||||
skipped**.
|
||||
|
||||
### Post-bind
|
||||
### PostBind {#post-bind}
|
||||
|
||||
This is an informational extension point. Post-bind plugins are called after a
|
||||
Pod is successfully bound. This is the end of a binding cycle, and can be used
|
||||
|
@ -209,88 +208,35 @@ interfaces have the following form.
|
|||
|
||||
```go
|
||||
type Plugin interface {
|
||||
Name() string
|
||||
Name() string
|
||||
}
|
||||
|
||||
type QueueSortPlugin interface {
|
||||
Plugin
|
||||
Less(*v1.pod, *v1.pod) bool
|
||||
Plugin
|
||||
Less(*v1.pod, *v1.pod) bool
|
||||
}
|
||||
|
||||
type PreFilterPlugin interface {
|
||||
Plugin
|
||||
PreFilter(PluginContext, *v1.pod) error
|
||||
Plugin
|
||||
PreFilter(context.Context, *framework.CycleState, *v1.pod) error
|
||||
}
|
||||
|
||||
// ...
|
||||
```
|
||||
|
||||
# Plugin Configuration
|
||||
## Plugin configuration
|
||||
|
||||
Plugins can be enabled in the scheduler configuration. Also, default plugins can
|
||||
be disabled in the configuration. In 1.15, there are no default plugins for the
|
||||
scheduling framework.
|
||||
You can enable or disable plugins in the scheduler configuration. If you are using
|
||||
Kubernetes v1.18 or later, most scheduling
|
||||
[plugins](/docs/reference/scheduling/profiles/#scheduling-plugins) are in use and
|
||||
enabled by default.
|
||||
|
||||
The scheduler configuration can include configuration for plugins as well. Such
|
||||
configurations are passed to the plugins at the time the scheduler initializes
|
||||
them. The configuration is an arbitrary value. The receiving plugin should
|
||||
decode and process the configuration.
|
||||
In addition to default plugins, you can also implement your own scheduling
|
||||
plugins and get them configured along with default plugins. You can visit
|
||||
[scheduler-plugins](https://github.com/kubernetes-sigs/scheduler-plugins) for more details.
|
||||
|
||||
The following example shows a scheduler configuration that enables some
|
||||
plugins at `reserve` and `preBind` extension points and disables a plugin. It
|
||||
also provides a configuration to plugin `foo`.
|
||||
|
||||
```yaml
|
||||
apiVersion: kubescheduler.config.k8s.io/v1alpha1
|
||||
kind: KubeSchedulerConfiguration
|
||||
|
||||
...
|
||||
|
||||
plugins:
|
||||
reserve:
|
||||
enabled:
|
||||
- name: foo
|
||||
- name: bar
|
||||
disabled:
|
||||
- name: baz
|
||||
preBind:
|
||||
enabled:
|
||||
- name: foo
|
||||
disabled:
|
||||
- name: baz
|
||||
|
||||
pluginConfig:
|
||||
- name: foo
|
||||
args: >
|
||||
Arbitrary set of args to plugin foo
|
||||
```
|
||||
|
||||
When an extension point is omitted from the configuration default plugins for
|
||||
that extension points are used. When an extension point exists and `enabled` is
|
||||
provided, the `enabled` plugins are called in addition to default plugins.
|
||||
Default plugins are called first and then the additional enabled plugins are
|
||||
called in the same order specified in the configuration. If a different order of
|
||||
calling default plugins is desired, default plugins must be `disabled` and
|
||||
`enabled` in the desired order.
|
||||
|
||||
Assuming there is a default plugin called `foo` at `reserve` and we are adding
|
||||
plugin `bar` that we want to be invoked before `foo`, we should disable `foo`
|
||||
and enable `bar` and `foo` in order. The following example shows the
|
||||
configuration that achieves this:
|
||||
|
||||
```yaml
|
||||
apiVersion: kubescheduler.config.k8s.io/v1alpha1
|
||||
kind: KubeSchedulerConfiguration
|
||||
|
||||
...
|
||||
|
||||
plugins:
|
||||
reserve:
|
||||
enabled:
|
||||
- name: bar
|
||||
- name: foo
|
||||
disabled:
|
||||
- name: foo
|
||||
```
|
||||
If you are using Kubernetes v1.18 or later, you can configure a set of plugins as
|
||||
a scheduler profile and then define multiple profiles to fit various kinds of workload.
|
||||
Learn more at [multiple profiles](/docs/reference/scheduling/profiles/#multiple-profiles).
|
||||
|
||||
{{% /capture %}}
|
|
@ -142,7 +142,7 @@ Area of Concern for Code | Recommendation |
|
|||
--------------------------------------------- | ------------ |
|
||||
Access over TLS only | If your code needs to communicate via TCP, ideally it would be performing a TLS handshake with the client ahead of time. With the exception of a few cases, the default behavior should be to encrypt everything in transit. Going one step further, even "behind the firewall" in our VPC's it's still a good idea to encrypt network traffic between services. This can be done through a process known as mutual or [mTLS](https://en.wikipedia.org/wiki/Mutual_authentication) which performs a two sided verification of communication between two certificate holding services. There are numerous tools that can be used to accomplish this in Kubernetes such as [Linkerd](https://linkerd.io/) and [Istio](https://istio.io/). |
|
||||
Limiting port ranges of communication | This recommendation may be a bit self-explanatory, but wherever possible you should only expose the ports on your service that are absolutely essential for communication or metric gathering. |
|
||||
3rd Party Dependency Security | Since our applications tend to have dependencies outside of our own codebases, it is a good practice to ensure that a regular scan of the code's dependencies are still secure with no CVE's currently filed against them. Each language has a tool for performing this check automatically. |
|
||||
3rd Party Dependency Security | Since our applications tend to have dependencies outside of our own codebases, it is a good practice to regularly scan the code's dependencies to ensure that they are still secure with no vulnerabilities currently filed against them. Each language has a tool for performing this check automatically. |
|
||||
Static Code Analysis | Most languages provide a way for a snippet of code to be analyzed for any potentially unsafe coding practices. Whenever possible you should perform checks using automated tooling that can scan codebases for common security errors. Some of the tools can be found here: https://www.owasp.org/index.php/Source_Code_Analysis_Tools |
|
||||
Dynamic probing attacks | There are a few automated tools that are able to be run against your service to try some of the well known attacks that commonly befall services. These include SQL injection, CSRF, and XSS. One of the most popular dynamic analysis tools is the OWASP Zed Attack proxy https://www.owasp.org/index.php/OWASP_Zed_Attack_Proxy_Project |
|
||||
|
||||
|
|
|
@ -422,10 +422,8 @@ LoadBalancer Ingress: a320587ffd19711e5a37606cf4a74574-1142138393.us-east-1.el
|
|||
|
||||
{{% capture whatsnext %}}
|
||||
|
||||
Kubernetes also supports Federated Services, which can span multiple
|
||||
clusters and cloud providers, to provide increased availability,
|
||||
better fault tolerance and greater scalability for your services. See
|
||||
the [Federated Services User Guide](/docs/concepts/cluster-administration/federation-service-discovery/)
|
||||
for further information.
|
||||
* Learn more about [Using a Service to Access an Application in a Cluster](/docs/tasks/access-application-cluster/service-access-application-cluster/)
|
||||
* Learn more about [Connecting a Front End to a Back End Using a Service](/docs/tasks/access-application-cluster/connecting-frontend-backend/)
|
||||
* Learn more about [Creating an External Load Balancer](/docs/tasks/access-application-cluster/create-external-load-balancer/)
|
||||
|
||||
{{% /capture %}}
|
||||
|
|
|
@ -38,14 +38,16 @@ For more up-to-date specification, see
|
|||
|
||||
## Services
|
||||
|
||||
### A records
|
||||
### A/AAAA records
|
||||
|
||||
"Normal" (not headless) Services are assigned a DNS A record for a name of the
|
||||
form `my-svc.my-namespace.svc.cluster-domain.example`. This resolves to the cluster IP
|
||||
"Normal" (not headless) Services are assigned a DNS A or AAAA record,
|
||||
depending on the IP family of the service, for a name of the form
|
||||
`my-svc.my-namespace.svc.cluster-domain.example`. This resolves to the cluster IP
|
||||
of the Service.
|
||||
|
||||
"Headless" (without a cluster IP) Services are also assigned a DNS A record for
|
||||
a name of the form `my-svc.my-namespace.svc.cluster-domain.example`. Unlike normal
|
||||
"Headless" (without a cluster IP) Services are also assigned a DNS A or AAAA record,
|
||||
depending on the IP family of the service, for a name of the form
|
||||
`my-svc.my-namespace.svc.cluster-domain.example`. Unlike normal
|
||||
Services, this resolves to the set of IPs of the pods selected by the Service.
|
||||
Clients are expected to consume the set or else use standard round-robin
|
||||
selection from the set.
|
||||
|
@ -128,22 +130,22 @@ spec:
|
|||
```
|
||||
|
||||
If there exists a headless service in the same namespace as the pod and with
|
||||
the same name as the subdomain, the cluster's KubeDNS Server also returns an A
|
||||
the same name as the subdomain, the cluster's DNS Server also returns an A or AAAA
|
||||
record for the Pod's fully qualified hostname.
|
||||
For example, given a Pod with the hostname set to "`busybox-1`" and the subdomain set to
|
||||
"`default-subdomain`", and a headless Service named "`default-subdomain`" in
|
||||
the same namespace, the pod will see its own FQDN as
|
||||
"`busybox-1.default-subdomain.my-namespace.svc.cluster-domain.example`". DNS serves an
|
||||
A record at that name, pointing to the Pod's IP. Both pods "`busybox1`" and
|
||||
"`busybox2`" can have their distinct A records.
|
||||
A or AAAA record at that name, pointing to the Pod's IP. Both pods "`busybox1`" and
|
||||
"`busybox2`" can have their distinct A or AAAA records.
|
||||
|
||||
The Endpoints object can specify the `hostname` for any endpoint addresses,
|
||||
along with its IP.
|
||||
|
||||
{{< note >}}
|
||||
Because A records are not created for Pod names, `hostname` is required for the Pod's A
|
||||
Because A or AAAA records are not created for Pod names, `hostname` is required for the Pod's A or AAAA
|
||||
record to be created. A Pod with no `hostname` but with `subdomain` will only create the
|
||||
A record for the headless service (`default-subdomain.my-namespace.svc.cluster-domain.example`),
|
||||
A or AAAA record for the headless service (`default-subdomain.my-namespace.svc.cluster-domain.example`),
|
||||
pointing to the Pod's IP address. Also, Pod needs to become ready in order to have a
|
||||
record unless `publishNotReadyAddresses=True` is set on the Service.
|
||||
{{< /note >}}
|
||||
|
|
|
@ -55,7 +55,7 @@ To enable IPv4/IPv6 dual-stack, enable the `IPv6DualStack` [feature gate](/docs/
|
|||
* `--feature-gates="IPv6DualStack=true"`
|
||||
* kube-proxy:
|
||||
* `--proxy-mode=ipvs`
|
||||
* `--cluster-cidrs=<IPv4 CIDR>,<IPv6 CIDR>`
|
||||
* `--cluster-cidr=<IPv4 CIDR>,<IPv6 CIDR>`
|
||||
* `--feature-gates="IPv6DualStack=true"`
|
||||
|
||||
{{< caution >}}
|
||||
|
|
|
@ -24,6 +24,21 @@ Endpoints.
|
|||
|
||||
{{% capture body %}}
|
||||
|
||||
## Motivation
|
||||
|
||||
The Endpoints API has provided a simple and straightforward way of
|
||||
tracking network endpoints in Kubernetes. Unfortunately as Kubernetes clusters
|
||||
and Services have gotten larger, limitations of that API became more visible.
|
||||
Most notably, those included challenges with scaling to larger numbers of
|
||||
network endpoints.
|
||||
|
||||
Since all network endpoints for a Service were stored in a single Endpoints
|
||||
resource, those resources could get quite large. That affected the performance
|
||||
of Kubernetes components (notably the master control plane) and resulted in
|
||||
significant amounts of network traffic and processing when Endpoints changed.
|
||||
EndpointSlices help you mitigate those issues as well as provide an extensible
|
||||
platform for additional features such as topological routing.
|
||||
|
||||
## EndpointSlice resources {#endpointslice-resource}
|
||||
|
||||
In Kubernetes, an EndpointSlice contains references to a set of network
|
||||
|
@ -32,6 +47,8 @@ for a Kubernetes Service when a {{< glossary_tooltip text="selector"
|
|||
term_id="selector" >}} is specified. These EndpointSlices will include
|
||||
references to any Pods that match the Service selector. EndpointSlices group
|
||||
network endpoints together by unique Service and Port combinations.
|
||||
The name of a EndpointSlice object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
As an example, here's a sample EndpointSlice resource for the `example`
|
||||
Kubernetes Service.
|
||||
|
@ -163,21 +180,6 @@ necessary soon anyway. Rolling updates of Deployments also provide a natural
|
|||
repacking of EndpointSlices with all pods and their corresponding endpoints
|
||||
getting replaced.
|
||||
|
||||
## Motivation
|
||||
|
||||
The Endpoints API has provided a simple and straightforward way of
|
||||
tracking network endpoints in Kubernetes. Unfortunately as Kubernetes clusters
|
||||
and Services have gotten larger, limitations of that API became more visible.
|
||||
Most notably, those included challenges with scaling to larger numbers of
|
||||
network endpoints.
|
||||
|
||||
Since all network endpoints for a Service were stored in a single Endpoints
|
||||
resource, those resources could get quite large. That affected the performance
|
||||
of Kubernetes components (notably the master control plane) and resulted in
|
||||
significant amounts of network traffic and processing when Endpoints changed.
|
||||
EndpointSlices help you mitigate those issues as well as provide an extensible
|
||||
platform for additional features such as topological routing.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture whatsnext %}}
|
||||
|
|
|
@ -17,24 +17,15 @@ weight: 40
|
|||
|
||||
For clarity, this guide defines the following terms:
|
||||
|
||||
Node
|
||||
: A worker machine in Kubernetes, part of a cluster.
|
||||
|
||||
Cluster
|
||||
: A set of Nodes that run containerized applications managed by Kubernetes. For this example, and in most common Kubernetes deployments, nodes in the cluster are not part of the public internet.
|
||||
|
||||
Edge router
|
||||
: A router that enforces the firewall policy for your cluster. This could be a gateway managed by a cloud provider or a physical piece of hardware.
|
||||
|
||||
Cluster network
|
||||
: A set of links, logical or physical, that facilitate communication within a cluster according to the Kubernetes [networking model](/docs/concepts/cluster-administration/networking/).
|
||||
|
||||
Service
|
||||
: A Kubernetes {{< glossary_tooltip term_id="service" >}} that identifies a set of Pods using {{< glossary_tooltip text="label" term_id="label" >}} selectors. Unless mentioned otherwise, Services are assumed to have virtual IPs only routable within the cluster network.
|
||||
* Node: A worker machine in Kubernetes, part of a cluster.
|
||||
* Cluster: A set of Nodes that run containerized applications managed by Kubernetes. For this example, and in most common Kubernetes deployments, nodes in the cluster are not part of the public internet.
|
||||
* Edge router: A router that enforces the firewall policy for your cluster. This could be a gateway managed by a cloud provider or a physical piece of hardware.
|
||||
* Cluster network: A set of links, logical or physical, that facilitate communication within a cluster according to the Kubernetes [networking model](/docs/concepts/cluster-administration/networking/).
|
||||
* Service: A Kubernetes {{< glossary_tooltip term_id="service" >}} that identifies a set of Pods using {{< glossary_tooltip text="label" term_id="label" >}} selectors. Unless mentioned otherwise, Services are assumed to have virtual IPs only routable within the cluster network.
|
||||
|
||||
## What is Ingress?
|
||||
|
||||
Ingress exposes HTTP and HTTPS routes from outside the cluster to
|
||||
[Ingress](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#ingress-v1beta1-networking-k8s-io) exposes HTTP and HTTPS routes from outside the cluster to
|
||||
{{< link text="services" url="/docs/concepts/services-networking/service/" >}} within the cluster.
|
||||
Traffic routing is controlled by rules defined on the Ingress resource.
|
||||
|
||||
|
@ -46,7 +37,7 @@ Traffic routing is controlled by rules defined on the Ingress resource.
|
|||
[ Services ]
|
||||
```
|
||||
|
||||
An Ingress can be configured to give Services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name based virtual hosting. An [Ingress controller](/docs/concepts/services-networking/ingress-controllers) is responsible for fulfilling the Ingress, usually with a load balancer, though it may also configure your edge router or additional frontends to help handle the traffic.
|
||||
An Ingress may be configured to give Services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name based virtual hosting. An [Ingress controller](/docs/concepts/services-networking/ingress-controllers) is responsible for fulfilling the Ingress, usually with a load balancer, though it may also configure your edge router or additional frontends to help handle the traffic.
|
||||
|
||||
An Ingress does not expose arbitrary ports or protocols. Exposing services other than HTTP and HTTPS to the internet typically
|
||||
uses a service of type [Service.Type=NodePort](/docs/concepts/services-networking/service/#nodeport) or
|
||||
|
@ -82,16 +73,19 @@ spec:
|
|||
- http:
|
||||
paths:
|
||||
- path: /testpath
|
||||
pathType: Prefix
|
||||
backend:
|
||||
serviceName: test
|
||||
servicePort: 80
|
||||
```
|
||||
|
||||
As with all other Kubernetes resources, an Ingress needs `apiVersion`, `kind`, and `metadata` fields.
|
||||
For general information about working with config files, see [deploying applications](/docs/tasks/run-application/run-stateless-application-deployment/), [configuring containers](/docs/tasks/configure-pod-container/configure-pod-configmap/), [managing resources](/docs/concepts/cluster-administration/manage-deployment/).
|
||||
As with all other Kubernetes resources, an Ingress needs `apiVersion`, `kind`, and `metadata` fields.
|
||||
The name of an Ingress object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
For general information about working with config files, see [deploying applications](/docs/tasks/run-application/run-stateless-application-deployment/), [configuring containers](/docs/tasks/configure-pod-container/configure-pod-configmap/), [managing resources](/docs/concepts/cluster-administration/manage-deployment/).
|
||||
Ingress frequently uses annotations to configure some options depending on the Ingress controller, an example of which
|
||||
is the [rewrite-target annotation](https://github.com/kubernetes/ingress-nginx/blob/master/docs/examples/rewrite/README.md).
|
||||
Different [Ingress controller](/docs/concepts/services-networking/ingress-controllers) support different annotations. Review the documentation for
|
||||
Different [Ingress controller](/docs/concepts/services-networking/ingress-controllers) support different annotations. Review the documentation for
|
||||
your choice of Ingress controller to learn which annotations are supported.
|
||||
|
||||
The Ingress [spec](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status)
|
||||
|
@ -124,6 +118,84 @@ backend is typically a configuration option of the [Ingress controller](/docs/co
|
|||
If none of the hosts or paths match the HTTP request in the Ingress objects, the traffic is
|
||||
routed to your default backend.
|
||||
|
||||
### Path Types
|
||||
|
||||
Each path in an Ingress has a corresponding path type. There are three supported
|
||||
path types:
|
||||
|
||||
* _`ImplementationSpecific`_ (default): With this path type, matching is up to
|
||||
the IngressClass. Implementations can treat this as a separate `pathType` or
|
||||
treat it identically to `Prefix` or `Exact` path types.
|
||||
|
||||
* _`Exact`_: Matches the URL path exactly and with case sensitivity.
|
||||
|
||||
* _`Prefix`_: Matches based on a URL path prefix split by `/`. Matching is case
|
||||
sensitive and done on a path element by element basis. A path element refers
|
||||
to the list of labels in the path split by the `/` separator. A request is a
|
||||
match for path _p_ if every _p_ is an element-wise prefix of _p_ of the
|
||||
request path.
|
||||
{{< note >}}
|
||||
If the last element of the path is a substring of the
|
||||
last element in request path, it is not a match (for example:
|
||||
`/foo/bar` matches`/foo/bar/baz`, but does not match `/foo/barbaz`).
|
||||
{{< /note >}}
|
||||
|
||||
#### Multiple Matches
|
||||
In some cases, multiple paths within an Ingress will match a request. In those
|
||||
cases precedence will be given first to the longest matching path. If two paths
|
||||
are still equally matched, precedence will be given to paths with an exact path
|
||||
type over prefix path type.
|
||||
|
||||
## Ingress Class
|
||||
|
||||
Ingresses can be implemented by different controllers, often with different
|
||||
configuration. Each Ingress should specify a class, a reference to an
|
||||
IngressClass resource that contains additional configuration including the name
|
||||
of the controller that should implement the class.
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1beta1
|
||||
kind: IngressClass
|
||||
metadata:
|
||||
name: external-lb
|
||||
spec:
|
||||
controller: example.com/ingress-controller
|
||||
parameters:
|
||||
apiGroup: k8s.example.com/v1alpha
|
||||
kind: IngressParameters
|
||||
name: external-lb
|
||||
```
|
||||
|
||||
IngressClass resources contain an optional parameters field. This can be used to
|
||||
reference additional configuration for this class.
|
||||
|
||||
### Deprecated Annotation
|
||||
|
||||
Before the IngressClass resource and `ingressClassName` field were added in
|
||||
Kubernetes 1.18, Ingress classes were specified with a
|
||||
`kubernetes.io/ingress.class` annotation on the Ingress. This annotation was
|
||||
never formally defined, but was widely supported by Ingress controllers.
|
||||
|
||||
The newer `ingressClassName` field on Ingresses is a replacement for that
|
||||
annotation, but is not a direct equivalent. While the annotation was generally
|
||||
used to reference the name of the Ingress controller that should implement the
|
||||
Ingress, the field is a reference to an IngressClass resource that contains
|
||||
additional Ingress configuration, including the name of the Ingress controller.
|
||||
|
||||
### Default Ingress Class
|
||||
|
||||
You can mark a particular IngressClass as default for your cluster. Setting the
|
||||
`ingressclass.kubernetes.io/is-default-class` annotation to `true` on an
|
||||
IngressClass resource will ensure that new Ingresses without an
|
||||
`ingressClassName` field specified will be assigned this default IngressClass.
|
||||
|
||||
{{< caution >}}
|
||||
If you have more than one IngressClass marked as the default for your cluster,
|
||||
the admission controller prevents creating new Ingress objects that don't have
|
||||
an `ingressClassName` specified. You can resolve this by ensuring that at most 1
|
||||
IngressClasess are marked as default in your cluster.
|
||||
{{< /caution >}}
|
||||
|
||||
## Types of Ingress
|
||||
|
||||
### Single Service Ingress
|
||||
|
@ -143,10 +215,10 @@ kubectl get ingress test-ingress
|
|||
|
||||
```
|
||||
NAME HOSTS ADDRESS PORTS AGE
|
||||
test-ingress * 107.178.254.228 80 59s
|
||||
test-ingress * 203.0.113.123 80 59s
|
||||
```
|
||||
|
||||
Where `107.178.254.228` is the IP allocated by the Ingress controller to satisfy
|
||||
Where `203.0.113.123` is the IP allocated by the Ingress controller to satisfy
|
||||
this Ingress.
|
||||
|
||||
{{< note >}}
|
||||
|
@ -345,7 +417,7 @@ spec:
|
|||
{{< note >}}
|
||||
There is a gap between TLS features supported by various Ingress
|
||||
controllers. Please refer to documentation on
|
||||
[nginx](https://git.k8s.io/ingress-nginx/README.md#https),
|
||||
[nginx](https://kubernetes.github.io/ingress-nginx/user-guide/tls/),
|
||||
[GCE](https://git.k8s.io/ingress-gce/README.md#frontend-https), or any other
|
||||
platform specific Ingress controller to understand how TLS works in your environment.
|
||||
{{< /note >}}
|
||||
|
@ -474,6 +546,7 @@ You can expose a Service in multiple ways that don't directly involve the Ingres
|
|||
{{% /capture %}}
|
||||
|
||||
{{% capture whatsnext %}}
|
||||
* Learn about [ingress controllers](/docs/concepts/services-networking/ingress-controllers/)
|
||||
* Learn about the [Ingress API](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#ingress-v1beta1-networking-k8s-io)
|
||||
* Learn about [Ingress Controllers](/docs/concepts/services-networking/ingress-controllers/)
|
||||
* [Set up Ingress on Minikube with the NGINX Controller](/docs/tasks/access-application-cluster/ingress-minikube)
|
||||
{{% /capture %}}
|
||||
|
|
|
@ -11,16 +11,16 @@ weight: 50
|
|||
{{< toc >}}
|
||||
|
||||
{{% capture overview %}}
|
||||
A network policy is a specification of how groups of pods are allowed to communicate with each other and other network endpoints.
|
||||
A network policy is a specification of how groups of {{< glossary_tooltip text="pods" term_id="pod">}} are allowed to communicate with each other and other network endpoints.
|
||||
|
||||
`NetworkPolicy` resources use labels to select pods and define rules which specify what traffic is allowed to the selected pods.
|
||||
NetworkPolicy resources use {{< glossary_tooltip text="labels" term_id="label">}} to select pods and define rules which specify what traffic is allowed to the selected pods.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture body %}}
|
||||
## Prerequisites
|
||||
|
||||
Network policies are implemented by the network plugin, so you must be using a networking solution which supports `NetworkPolicy` - simply creating the resource without a controller to implement it will have no effect.
|
||||
Network policies are implemented by the [network plugin](/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/). To use network policies, you must be using a networking solution which supports NetworkPolicy. Creating a NetworkPolicy resource without a controller that implements it will have no effect.
|
||||
|
||||
## Isolated and Non-isolated Pods
|
||||
|
||||
|
@ -30,11 +30,11 @@ Pods become isolated by having a NetworkPolicy that selects them. Once there is
|
|||
|
||||
Network policies do not conflict, they are additive. If any policy or policies select a pod, the pod is restricted to what is allowed by the union of those policies' ingress/egress rules. Thus, order of evaluation does not affect the policy result.
|
||||
|
||||
## The `NetworkPolicy` Resource
|
||||
## The NetworkPolicy resource {#networkpolicy-resource}
|
||||
|
||||
See the [NetworkPolicy](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#networkpolicy-v1-networking-k8s-io) for a full definition of the resource.
|
||||
See the [NetworkPolicy](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#networkpolicy-v1-networking-k8s-io) reference for a full definition of the resource.
|
||||
|
||||
An example `NetworkPolicy` might look like this:
|
||||
An example NetworkPolicy might look like this:
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
|
@ -73,23 +73,25 @@ spec:
|
|||
port: 5978
|
||||
```
|
||||
|
||||
*POSTing this to the API server will have no effect unless your chosen networking solution supports network policy.*
|
||||
{{< note >}}
|
||||
POSTing this to the API server for your cluster will have no effect unless your chosen networking solution supports network policy.
|
||||
{{< /note >}}
|
||||
|
||||
__Mandatory Fields__: As with all other Kubernetes config, a `NetworkPolicy`
|
||||
__Mandatory Fields__: As with all other Kubernetes config, a NetworkPolicy
|
||||
needs `apiVersion`, `kind`, and `metadata` fields. For general information
|
||||
about working with config files, see
|
||||
[Configure Containers Using a ConfigMap](/docs/tasks/configure-pod-container/configure-pod-configmap/),
|
||||
and [Object Management](/docs/concepts/overview/working-with-objects/object-management).
|
||||
|
||||
__spec__: `NetworkPolicy` [spec](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status) has all the information needed to define a particular network policy in the given namespace.
|
||||
__spec__: NetworkPolicy [spec](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status) has all the information needed to define a particular network policy in the given namespace.
|
||||
|
||||
__podSelector__: Each `NetworkPolicy` includes a `podSelector` which selects the grouping of pods to which the policy applies. The example policy selects pods with the label "role=db". An empty `podSelector` selects all pods in the namespace.
|
||||
__podSelector__: Each NetworkPolicy includes a `podSelector` which selects the grouping of pods to which the policy applies. The example policy selects pods with the label "role=db". An empty `podSelector` selects all pods in the namespace.
|
||||
|
||||
__policyTypes__: Each `NetworkPolicy` includes a `policyTypes` list which may include either `Ingress`, `Egress`, or both. The `policyTypes` field indicates whether or not the given policy applies to ingress traffic to selected pod, egress traffic from selected pods, or both. If no `policyTypes` are specified on a NetworkPolicy then by default `Ingress` will always be set and `Egress` will be set if the NetworkPolicy has any egress rules.
|
||||
__policyTypes__: Each NetworkPolicy includes a `policyTypes` list which may include either `Ingress`, `Egress`, or both. The `policyTypes` field indicates whether or not the given policy applies to ingress traffic to selected pod, egress traffic from selected pods, or both. If no `policyTypes` are specified on a NetworkPolicy then by default `Ingress` will always be set and `Egress` will be set if the NetworkPolicy has any egress rules.
|
||||
|
||||
__ingress__: Each `NetworkPolicy` may include a list of whitelist `ingress` rules. Each rule allows traffic which matches both the `from` and `ports` sections. The example policy contains a single rule, which matches traffic on a single port, from one of three sources, the first specified via an `ipBlock`, the second via a `namespaceSelector` and the third via a `podSelector`.
|
||||
__ingress__: Each NetworkPolicy may include a list of whitelist `ingress` rules. Each rule allows traffic which matches both the `from` and `ports` sections. The example policy contains a single rule, which matches traffic on a single port, from one of three sources, the first specified via an `ipBlock`, the second via a `namespaceSelector` and the third via a `podSelector`.
|
||||
|
||||
__egress__: Each `NetworkPolicy` may include a list of whitelist `egress` rules. Each rule allows traffic which matches both the `to` and `ports` sections. The example policy contains a single rule, which matches traffic on a single port to any destination in `10.0.0.0/24`.
|
||||
__egress__: Each NetworkPolicy may include a list of whitelist `egress` rules. Each rule allows traffic which matches both the `to` and `ports` sections. The example policy contains a single rule, which matches traffic on a single port to any destination in `10.0.0.0/24`.
|
||||
|
||||
So, the example NetworkPolicy:
|
||||
|
||||
|
@ -107,7 +109,7 @@ See the [Declare Network Policy](/docs/tasks/administer-cluster/declare-network-
|
|||
|
||||
There are four kinds of selectors that can be specified in an `ingress` `from` section or `egress` `to` section:
|
||||
|
||||
__podSelector__: This selects particular Pods in the same namespace as the `NetworkPolicy` which should be allowed as ingress sources or egress destinations.
|
||||
__podSelector__: This selects particular Pods in the same namespace as the NetworkPolicy which should be allowed as ingress sources or egress destinations.
|
||||
|
||||
__namespaceSelector__: This selects particular namespaces for which all Pods should be allowed as ingress sources or egress destinations.
|
||||
|
||||
|
@ -168,16 +170,7 @@ in that namespace.
|
|||
|
||||
You can create a "default" isolation policy for a namespace by creating a NetworkPolicy that selects all pods but does not allow any ingress traffic to those pods.
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: default-deny
|
||||
spec:
|
||||
podSelector: {}
|
||||
policyTypes:
|
||||
- Ingress
|
||||
```
|
||||
{{< codenew file="service/networking/network-policy-default-deny-ingress.yaml" >}}
|
||||
|
||||
This ensures that even pods that aren't selected by any other NetworkPolicy will still be isolated. This policy does not change the default egress isolation behavior.
|
||||
|
||||
|
@ -185,33 +178,13 @@ This ensures that even pods that aren't selected by any other NetworkPolicy will
|
|||
|
||||
If you want to allow all traffic to all pods in a namespace (even if policies are added that cause some pods to be treated as "isolated"), you can create a policy that explicitly allows all traffic in that namespace.
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: allow-all
|
||||
spec:
|
||||
podSelector: {}
|
||||
ingress:
|
||||
- {}
|
||||
policyTypes:
|
||||
- Ingress
|
||||
```
|
||||
{{< codenew file="service/networking/network-policy-allow-all-ingress.yaml" >}}
|
||||
|
||||
### Default deny all egress traffic
|
||||
|
||||
You can create a "default" egress isolation policy for a namespace by creating a NetworkPolicy that selects all pods but does not allow any egress traffic from those pods.
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: default-deny
|
||||
spec:
|
||||
podSelector: {}
|
||||
policyTypes:
|
||||
- Egress
|
||||
```
|
||||
{{< codenew file="service/networking/network-policy-default-deny-egress.yaml" >}}
|
||||
|
||||
This ensures that even pods that aren't selected by any other NetworkPolicy will not be allowed egress traffic. This policy does not
|
||||
change the default ingress isolation behavior.
|
||||
|
@ -220,34 +193,13 @@ change the default ingress isolation behavior.
|
|||
|
||||
If you want to allow all traffic from all pods in a namespace (even if policies are added that cause some pods to be treated as "isolated"), you can create a policy that explicitly allows all egress traffic in that namespace.
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: allow-all
|
||||
spec:
|
||||
podSelector: {}
|
||||
egress:
|
||||
- {}
|
||||
policyTypes:
|
||||
- Egress
|
||||
```
|
||||
{{< codenew file="service/networking/network-policy-allow-all-egress.yaml" >}}
|
||||
|
||||
### Default deny all ingress and all egress traffic
|
||||
|
||||
You can create a "default" policy for a namespace which prevents all ingress AND egress traffic by creating the following NetworkPolicy in that namespace.
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: default-deny
|
||||
spec:
|
||||
podSelector: {}
|
||||
policyTypes:
|
||||
- Ingress
|
||||
- Egress
|
||||
```
|
||||
{{< codenew file="service/networking/network-policy-default-deny-all.yaml" >}}
|
||||
|
||||
This ensures that even pods that aren't selected by any other NetworkPolicy will not be allowed ingress or egress traffic.
|
||||
|
||||
|
@ -255,9 +207,12 @@ This ensures that even pods that aren't selected by any other NetworkPolicy will
|
|||
|
||||
{{< feature-state for_k8s_version="v1.12" state="alpha" >}}
|
||||
|
||||
Kubernetes supports SCTP as a `protocol` value in `NetworkPolicy` definitions as an alpha feature. To enable this feature, the cluster administrator needs to enable the `SCTPSupport` feature gate on the apiserver, for example, `“--feature-gates=SCTPSupport=true,...”`. When the feature gate is enabled, users can set the `protocol` field of a `NetworkPolicy` to `SCTP`. Kubernetes sets up the network accordingly for the SCTP associations, just like it does for TCP connections.
|
||||
To use this feature, you (or your cluster administrator) will need to enable the `SCTPSupport` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) for the API server with `--feature-gates=SCTPSupport=true,…`.
|
||||
When the feature gate is enabled, you can set the `protocol` field of a NetworkPolicy to `SCTP`.
|
||||
|
||||
The CNI plugin has to support SCTP as `protocol` value in `NetworkPolicy`.
|
||||
{{< note >}}
|
||||
You must be using a {{< glossary_tooltip text="CNI" term_id="cni" >}} plugin that supports SCTP protocol NetworkPolicies.
|
||||
{{< /note >}}
|
||||
|
||||
|
||||
{{% /capture %}}
|
||||
|
@ -266,6 +221,6 @@ The CNI plugin has to support SCTP as `protocol` value in `NetworkPolicy`.
|
|||
|
||||
- See the [Declare Network Policy](/docs/tasks/administer-cluster/declare-network-policy/)
|
||||
walkthrough for further examples.
|
||||
- See more [Recipes](https://github.com/ahmetb/kubernetes-network-policy-recipes) for common scenarios enabled by the NetworkPolicy resource.
|
||||
- See more [recipes](https://github.com/ahmetb/kubernetes-network-policy-recipes) for common scenarios enabled by the NetworkPolicy resource.
|
||||
|
||||
{{% /capture %}}
|
||||
|
|
|
@ -46,23 +46,6 @@ with it, while intrazonal traffic does not. Other common needs include being abl
|
|||
to route traffic to a local Pod managed by a DaemonSet, or keeping traffic to
|
||||
Nodes connected to the same top-of-rack switch for the lowest latency.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
The following prerequisites are needed in order to enable topology aware service
|
||||
routing:
|
||||
|
||||
* Kubernetes 1.17 or later
|
||||
* Kube-proxy running in iptables mode or IPVS mode
|
||||
* Enable [Endpoint Slices](/docs/concepts/services-networking/endpoint-slices/)
|
||||
|
||||
## Enable Service Topology
|
||||
|
||||
To enable service topology, enable the `ServiceTopology` feature gate for
|
||||
kube-apiserver and kube-proxy:
|
||||
|
||||
```
|
||||
--feature-gates="ServiceTopology=true"
|
||||
```
|
||||
|
||||
## Using Service Topology
|
||||
|
||||
|
@ -117,6 +100,98 @@ traffic as follows.
|
|||
it is used.
|
||||
|
||||
|
||||
## Examples
|
||||
|
||||
The following are common examples of using the Service Topology feature.
|
||||
|
||||
### Only Node Local Endpoints
|
||||
|
||||
A Service that only routes to node local endpoints. If no endpoints exist on the node, traffic is dropped:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: my-service
|
||||
spec:
|
||||
selector:
|
||||
app: my-app
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 80
|
||||
targetPort: 9376
|
||||
topologyKeys:
|
||||
- "kubernetes.io/hostname"
|
||||
```
|
||||
|
||||
### Prefer Node Local Endpoints
|
||||
|
||||
A Service that prefers node local Endpoints but falls back to cluster wide endpoints if node local endpoints do not exist:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: my-service
|
||||
spec:
|
||||
selector:
|
||||
app: my-app
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 80
|
||||
targetPort: 9376
|
||||
topologyKeys:
|
||||
- "kubernetes.io/hostname"
|
||||
- "*"
|
||||
```
|
||||
|
||||
|
||||
### Only Zonal or Regional Endpoints
|
||||
|
||||
A Service that prefers zonal then regional endpoints. If no endpoints exist in either, traffic is dropped.
|
||||
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: my-service
|
||||
spec:
|
||||
selector:
|
||||
app: my-app
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 80
|
||||
targetPort: 9376
|
||||
topologyKeys:
|
||||
- "topology.kubernetes.io/zone"
|
||||
- "topology.kubernetes.io/region"
|
||||
```
|
||||
|
||||
### Prefer Node Local, Zonal, then Regional Endpoints
|
||||
|
||||
A Service that prefers node local, zonal, then regional endpoints but falls back to cluster wide endpoints.
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: my-service
|
||||
spec:
|
||||
selector:
|
||||
app: my-app
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 80
|
||||
targetPort: 9376
|
||||
topologyKeys:
|
||||
- "kubernetes.io/hostname"
|
||||
- "topology.kubernetes.io/zone"
|
||||
- "topology.kubernetes.io/region"
|
||||
- "*"
|
||||
```
|
||||
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture whatsnext %}}
|
||||
|
|
|
@ -73,6 +73,8 @@ balancer in between your application and the backend Pods.
|
|||
A Service in Kubernetes is a REST object, similar to a Pod. Like all of the
|
||||
REST objects, you can `POST` a Service definition to the API server to create
|
||||
a new instance.
|
||||
The name of a Service object must be a valid
|
||||
[DNS label name](/docs/concepts/overview/working-with-objects/names#dns-label-names).
|
||||
|
||||
For example, suppose you have a set of Pods that each listen on TCP port 9376
|
||||
and carry a label `app=MyApp`:
|
||||
|
@ -167,6 +169,9 @@ subsets:
|
|||
- port: 9376
|
||||
```
|
||||
|
||||
The name of the Endpoints object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
{{< note >}}
|
||||
The endpoint IPs _must not_ be: loopback (127.0.0.0/8 for IPv4, ::1/128 for IPv6), or
|
||||
link-local (169.254.0.0/16 and 224.0.0.0/24 for IPv4, fe80::/64 for IPv6).
|
||||
|
@ -197,6 +202,17 @@ endpoints.
|
|||
EndpointSlices provide additional attributes and functionality which is
|
||||
described in detail in [EndpointSlices](/docs/concepts/services-networking/endpoint-slices/).
|
||||
|
||||
### Application protocol
|
||||
|
||||
{{< feature-state for_k8s_version="v1.18" state="alpha" >}}
|
||||
|
||||
The AppProtocol field provides a way to specify an application protocol to be
|
||||
used for each Service port.
|
||||
|
||||
As an alpha feature, this field is not enabled by default. To use this field,
|
||||
enable the `ServiceAppProtocol` [feature
|
||||
gate](/docs/reference/command-line-tools-reference/feature-gates/).
|
||||
|
||||
## Virtual IPs and service proxies
|
||||
|
||||
Every node in a Kubernetes cluster runs a `kube-proxy`. `kube-proxy` is
|
||||
|
@ -1173,19 +1189,6 @@ SCTP is not supported on Windows based nodes.
|
|||
The kube-proxy does not support the management of SCTP associations when it is in userspace mode.
|
||||
{{< /warning >}}
|
||||
|
||||
## Future work
|
||||
|
||||
In the future, the proxy policy for Services can become more nuanced than
|
||||
simple round-robin balancing, for example master-elected or sharded. We also
|
||||
envision that some Services will have "real" load balancers, in which case the
|
||||
virtual IP address will simply transport the packets there.
|
||||
|
||||
The Kubernetes project intends to improve support for L7 (HTTP) Services.
|
||||
|
||||
The Kubernetes project intends to have more flexible ingress modes for Services
|
||||
that encompass the current ClusterIP, NodePort, and LoadBalancer modes and more.
|
||||
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture whatsnext %}}
|
||||
|
|
|
@ -46,6 +46,9 @@ To enable dynamic provisioning, a cluster administrator needs to pre-create
|
|||
one or more StorageClass objects for users.
|
||||
StorageClass objects define which provisioner should be used and what parameters
|
||||
should be passed to that provisioner when dynamic provisioning is invoked.
|
||||
The name of a StorageClass object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
The following manifest creates a storage class "slow" which provisions standard
|
||||
disk-like persistent disks.
|
||||
|
||||
|
|
|
@ -4,6 +4,7 @@ reviewers:
|
|||
- saad-ali
|
||||
- thockin
|
||||
- msau42
|
||||
- xing-yang
|
||||
title: Persistent Volumes
|
||||
feature:
|
||||
title: Storage orchestration
|
||||
|
@ -16,7 +17,7 @@ weight: 20
|
|||
|
||||
{{% capture overview %}}
|
||||
|
||||
This document describes the current state of `PersistentVolumes` in Kubernetes. Familiarity with [volumes](/docs/concepts/storage/volumes/) is suggested.
|
||||
This document describes the current state of _persistent volumes_ in Kubernetes. Familiarity with [volumes](/docs/concepts/storage/volumes/) is suggested.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
|
@ -25,23 +26,16 @@ This document describes the current state of `PersistentVolumes` in Kubernetes.
|
|||
|
||||
## Introduction
|
||||
|
||||
Managing storage is a distinct problem from managing compute instances. The `PersistentVolume` subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed. To do this, we introduce two new API resources: `PersistentVolume` and `PersistentVolumeClaim`.
|
||||
Managing storage is a distinct problem from managing compute instances. The PersistentVolume subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed. To do this, we introduce two new API resources: PersistentVolume and PersistentVolumeClaim.
|
||||
|
||||
A `PersistentVolume` (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using [Storage Classes](/docs/concepts/storage/storage-classes/). It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
|
||||
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using [Storage Classes](/docs/concepts/storage/storage-classes/). It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
|
||||
|
||||
A `PersistentVolumeClaim` (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted once read/write or many times read-only).
|
||||
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted once read/write or many times read-only).
|
||||
|
||||
While `PersistentVolumeClaims` allow a user to consume abstract storage
|
||||
resources, it is common that users need `PersistentVolumes` with varying
|
||||
properties, such as performance, for different problems. Cluster administrators
|
||||
need to be able to offer a variety of `PersistentVolumes` that differ in more
|
||||
ways than just size and access modes, without exposing users to the details of
|
||||
how those volumes are implemented. For these needs, there is the `StorageClass`
|
||||
resource.
|
||||
While PersistentVolumeClaims allow a user to consume abstract storage resources, it is common that users need PersistentVolumes with varying properties, such as performance, for different problems. Cluster administrators need to be able to offer a variety of PersistentVolumes that differ in more ways than just size and access modes, without exposing users to the details of how those volumes are implemented. For these needs, there is the _StorageClass_ resource.
|
||||
|
||||
See the [detailed walkthrough with working examples](/docs/tasks/configure-pod-container/configure-persistent-volume-storage/).
|
||||
|
||||
|
||||
## Lifecycle of a volume and claim
|
||||
|
||||
PVs are resources in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle:
|
||||
|
@ -51,12 +45,14 @@ PVs are resources in the cluster. PVCs are requests for those resources and also
|
|||
There are two ways PVs may be provisioned: statically or dynamically.
|
||||
|
||||
#### Static
|
||||
|
||||
A cluster administrator creates a number of PVs. They carry the details of the real storage, which is available for use by cluster users. They exist in the Kubernetes API and are available for consumption.
|
||||
|
||||
#### Dynamic
|
||||
When none of the static PVs the administrator created match a user's `PersistentVolumeClaim`,
|
||||
|
||||
When none of the static PVs the administrator created match a user's PersistentVolumeClaim,
|
||||
the cluster may try to dynamically provision a volume specially for the PVC.
|
||||
This provisioning is based on `StorageClasses`: the PVC must request a
|
||||
This provisioning is based on StorageClasses: the PVC must request a
|
||||
[storage class](/docs/concepts/storage/storage-classes/) and
|
||||
the administrator must have created and configured that class for dynamic
|
||||
provisioning to occur. Claims that request the class `""` effectively disable
|
||||
|
@ -71,7 +67,7 @@ check [kube-apiserver](/docs/admin/kube-apiserver/) documentation.
|
|||
|
||||
### Binding
|
||||
|
||||
A user creates, or in the case of dynamic provisioning, has already created, a `PersistentVolumeClaim` with a specific amount of storage requested and with certain access modes. A control loop in the master watches for new PVCs, finds a matching PV (if possible), and binds them together. If a PV was dynamically provisioned for a new PVC, the loop will always bind that PV to the PVC. Otherwise, the user will always get at least what they asked for, but the volume may be in excess of what was requested. Once bound, `PersistentVolumeClaim` binds are exclusive, regardless of how they were bound. A PVC to PV binding is a one-to-one mapping.
|
||||
A user creates, or in the case of dynamic provisioning, has already created, a PersistentVolumeClaim with a specific amount of storage requested and with certain access modes. A control loop in the master watches for new PVCs, finds a matching PV (if possible), and binds them together. If a PV was dynamically provisioned for a new PVC, the loop will always bind that PV to the PVC. Otherwise, the user will always get at least what they asked for, but the volume may be in excess of what was requested. Once bound, PersistentVolumeClaim binds are exclusive, regardless of how they were bound. A PVC to PV binding is a one-to-one mapping, using a ClaimRef which is a bi-directional binding between the PersistentVolume and the PersistentVolumeClaim.
|
||||
|
||||
Claims will remain unbound indefinitely if a matching volume does not exist. Claims will be bound as matching volumes become available. For example, a cluster provisioned with many 50Gi PVs would not match a PVC requesting 100Gi. The PVC can be bound when a 100Gi PV is added to the cluster.
|
||||
|
||||
|
@ -79,10 +75,10 @@ Claims will remain unbound indefinitely if a matching volume does not exist. Cla
|
|||
|
||||
Pods use claims as volumes. The cluster inspects the claim to find the bound volume and mounts that volume for a Pod. For volumes that support multiple access modes, the user specifies which mode is desired when using their claim as a volume in a Pod.
|
||||
|
||||
Once a user has a claim and that claim is bound, the bound PV belongs to the user for as long as they need it. Users schedule Pods and access their claimed PVs by including a `persistentVolumeClaim` in their Pod's volumes block. [See below for syntax details](#claims-as-volumes).
|
||||
Once a user has a claim and that claim is bound, the bound PV belongs to the user for as long as they need it. Users schedule Pods and access their claimed PVs by including a `persistentVolumeClaim` section in a Pod's `volumes` block. See [Claims As Volumes](#claims-as-volumes) for more details on this.
|
||||
|
||||
### Storage Object in Use Protection
|
||||
The purpose of the Storage Object in Use Protection feature is to ensure that Persistent Volume Claims (PVCs) in active use by a Pod and Persistent Volume (PVs) that are bound to PVCs are not removed from the system, as this may result in data loss.
|
||||
The purpose of the Storage Object in Use Protection feature is to ensure that PersistentVolumeClaims (PVCs) in active use by a Pod and PersistentVolume (PVs) that are bound to PVCs are not removed from the system, as this may result in data loss.
|
||||
|
||||
{{< note >}}
|
||||
PVC is in active use by a Pod when a Pod object exists that is using the PVC.
|
||||
|
@ -130,19 +126,19 @@ Events: <none>
|
|||
|
||||
### Reclaiming
|
||||
|
||||
When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a `PersistentVolume` tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled, or Deleted.
|
||||
When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled, or Deleted.
|
||||
|
||||
#### Retain
|
||||
|
||||
The `Retain` reclaim policy allows for manual reclamation of the resource. When the `PersistentVolumeClaim` is deleted, the `PersistentVolume` still exists and the volume is considered "released". But it is not yet available for another claim because the previous claimant's data remains on the volume. An administrator can manually reclaim the volume with the following steps.
|
||||
The `Retain` reclaim policy allows for manual reclamation of the resource. When the PersistentVolumeClaim is deleted, the PersistentVolume still exists and the volume is considered "released". But it is not yet available for another claim because the previous claimant's data remains on the volume. An administrator can manually reclaim the volume with the following steps.
|
||||
|
||||
1. Delete the `PersistentVolume`. The associated storage asset in external infrastructure (such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume) still exists after the PV is deleted.
|
||||
1. Delete the PersistentVolume. The associated storage asset in external infrastructure (such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume) still exists after the PV is deleted.
|
||||
1. Manually clean up the data on the associated storage asset accordingly.
|
||||
1. Manually delete the associated storage asset, or if you want to reuse the same storage asset, create a new `PersistentVolume` with the storage asset definition.
|
||||
1. Manually delete the associated storage asset, or if you want to reuse the same storage asset, create a new PersistentVolume with the storage asset definition.
|
||||
|
||||
#### Delete
|
||||
|
||||
For volume plugins that support the `Delete` reclaim policy, deletion removes both the `PersistentVolume` object from Kubernetes, as well as the associated storage asset in the external infrastructure, such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume. Volumes that were dynamically provisioned inherit the [reclaim policy of their `StorageClass`](#reclaim-policy), which defaults to `Delete`. The administrator should configure the `StorageClass` according to users' expectations; otherwise, the PV must be edited or patched after it is created. See [Change the Reclaim Policy of a PersistentVolume](/docs/tasks/administer-cluster/change-pv-reclaim-policy/).
|
||||
For volume plugins that support the `Delete` reclaim policy, deletion removes both the PersistentVolume object from Kubernetes, as well as the associated storage asset in the external infrastructure, such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume. Volumes that were dynamically provisioned inherit the [reclaim policy of their StorageClass](#reclaim-policy), which defaults to `Delete`. The administrator should configure the StorageClass according to users' expectations; otherwise, the PV must be edited or patched after it is created. See [Change the Reclaim Policy of a PersistentVolume](/docs/tasks/administer-cluster/change-pv-reclaim-policy/).
|
||||
|
||||
#### Recycle
|
||||
|
||||
|
@ -212,8 +208,8 @@ allowVolumeExpansion: true
|
|||
```
|
||||
|
||||
To request a larger volume for a PVC, edit the PVC object and specify a larger
|
||||
size. This triggers expansion of the volume that backs the underlying `PersistentVolume`. A
|
||||
new `PersistentVolume` is never created to satisfy the claim. Instead, an existing volume is resized.
|
||||
size. This triggers expansion of the volume that backs the underlying PersistentVolume. A
|
||||
new PersistentVolume is never created to satisfy the claim. Instead, an existing volume is resized.
|
||||
|
||||
#### CSI Volume expansion
|
||||
|
||||
|
@ -227,7 +223,7 @@ Support for expanding CSI volumes is enabled by default but it also requires a s
|
|||
You can only resize volumes containing a file system if the file system is XFS, Ext3, or Ext4.
|
||||
|
||||
When a volume contains a file system, the file system is only resized when a new Pod is using
|
||||
the `PersistentVolumeClaim` in ReadWrite mode. File system expansion is either done when a Pod is starting up
|
||||
the PersistentVolumeClaim in `ReadWrite` mode. File system expansion is either done when a Pod is starting up
|
||||
or when a Pod is running and the underlying file system supports online expansion.
|
||||
|
||||
FlexVolumes allow resize if the driver is set with the `RequiresFSResize` capability to `true`.
|
||||
|
@ -260,7 +256,7 @@ Expanding EBS volumes is a time-consuming operation. Also, there is a per-volume
|
|||
|
||||
## Types of Persistent Volumes
|
||||
|
||||
`PersistentVolume` types are implemented as plugins. Kubernetes currently supports the following plugins:
|
||||
PersistentVolume types are implemented as plugins. Kubernetes currently supports the following plugins:
|
||||
|
||||
* GCEPersistentDisk
|
||||
* AWSElasticBlockStore
|
||||
|
@ -286,6 +282,8 @@ Expanding EBS volumes is a time-consuming operation. Also, there is a per-volume
|
|||
## Persistent Volumes
|
||||
|
||||
Each PV contains a spec and status, which is the specification and status of the volume.
|
||||
The name of a PersistentVolume object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
|
@ -308,6 +306,10 @@ spec:
|
|||
server: 172.17.0.2
|
||||
```
|
||||
|
||||
{{< note >}}
|
||||
Helper programs relating to the volume type may be required for consumption of a PersistentVolume within a cluster. In this example, the PersistentVolume is of type NFS and the helper program /sbin/mount.nfs is required to support the mounting of NFS filesystems.
|
||||
{{< /note >}}
|
||||
|
||||
### Capacity
|
||||
|
||||
Generally, a PV will have a specific storage capacity. This is set using the PV's `capacity` attribute. See the Kubernetes [Resource Model](https://git.k8s.io/community/contributors/design-proposals/scheduling/resources.md) to understand the units expected by `capacity`.
|
||||
|
@ -316,16 +318,28 @@ Currently, storage size is the only resource that can be set or requested. Futu
|
|||
|
||||
### Volume Mode
|
||||
|
||||
{{< feature-state for_k8s_version="v1.13" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.18" state="stable" >}}
|
||||
|
||||
Prior to Kubernetes 1.9, all volume plugins created a filesystem on the persistent volume.
|
||||
Now, you can set the value of `volumeMode` to `block` to use a raw block device, or `filesystem`
|
||||
to use a filesystem. `filesystem` is the default if the value is omitted. This is an optional API
|
||||
parameter.
|
||||
Kubernetes supports two `volumeModes` of PersistentVolumes: `Filesystem` and `Block`.
|
||||
|
||||
`volumeMode` is an optional API parameter.
|
||||
`Filesystem` is the default mode used when `volumeMode` parameter is omitted.
|
||||
|
||||
A volume with `volumeMode: Filesystem` is *mounted* into Pods into a directory. If the volume
|
||||
is backed by a block device and the device is empty, Kuberneretes creates a filesystem
|
||||
on the device before mounting it for the first time.
|
||||
|
||||
You can set the value of `volumeMode` to `Block` to use a volume as a raw block device.
|
||||
Such volume is presented into a Pod as a block device, without any filesystem on it.
|
||||
This mode is useful to provide a Pod the fastest possible way to access a volume, without
|
||||
any filesystem layer between the Pod and the volume. On the other hand, the application
|
||||
running in the Pod must know how to handle a raw block device.
|
||||
See [Raw Block Volume Support](docs/concepts/storage/persistent-volumes/#raw-block-volume-support)
|
||||
for an example on how to use a volume with `volumeMode: Block` in a Pod.
|
||||
|
||||
### Access Modes
|
||||
|
||||
A `PersistentVolume` can be mounted on a host in any way supported by the resource provider. As shown in the table below, providers will have different capabilities and each PV's access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV's capabilities.
|
||||
A PersistentVolume can be mounted on a host in any way supported by the resource provider. As shown in the table below, providers will have different capabilities and each PV's access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV's capabilities.
|
||||
|
||||
The access modes are:
|
||||
|
||||
|
@ -440,6 +454,8 @@ The CLI will show the name of the PVC bound to the PV.
|
|||
## PersistentVolumeClaims
|
||||
|
||||
Each PVC contains a spec and status, which is the specification and status of the claim.
|
||||
The name of a PersistentVolumeClaim object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
|
@ -499,22 +515,22 @@ by the cluster, depending on whether the
|
|||
is turned on.
|
||||
|
||||
* If the admission plugin is turned on, the administrator may specify a
|
||||
default `StorageClass`. All PVCs that have no `storageClassName` can be bound only to
|
||||
PVs of that default. Specifying a default `StorageClass` is done by setting the
|
||||
default StorageClass. All PVCs that have no `storageClassName` can be bound only to
|
||||
PVs of that default. Specifying a default StorageClass is done by setting the
|
||||
annotation `storageclass.kubernetes.io/is-default-class` equal to `true` in
|
||||
a `StorageClass` object. If the administrator does not specify a default, the
|
||||
a StorageClass object. If the administrator does not specify a default, the
|
||||
cluster responds to PVC creation as if the admission plugin were turned off. If
|
||||
more than one default is specified, the admission plugin forbids the creation of
|
||||
all PVCs.
|
||||
* If the admission plugin is turned off, there is no notion of a default
|
||||
`StorageClass`. All PVCs that have no `storageClassName` can be bound only to PVs that
|
||||
StorageClass. All PVCs that have no `storageClassName` can be bound only to PVs that
|
||||
have no class. In this case, the PVCs that have no `storageClassName` are treated the
|
||||
same way as PVCs that have their `storageClassName` set to `""`.
|
||||
|
||||
Depending on installation method, a default StorageClass may be deployed
|
||||
to a Kubernetes cluster by addon manager during installation.
|
||||
|
||||
When a PVC specifies a `selector` in addition to requesting a `StorageClass`,
|
||||
When a PVC specifies a `selector` in addition to requesting a StorageClass,
|
||||
the requirements are ANDed together: only a PV of the requested class and with
|
||||
the requested labels may be bound to the PVC.
|
||||
|
||||
|
@ -528,7 +544,7 @@ it won't be supported in a future Kubernetes release.
|
|||
|
||||
## Claims As Volumes
|
||||
|
||||
Pods access storage by using the claim as a volume. Claims must exist in the same namespace as the Pod using the claim. The cluster finds the claim in the Pod's namespace and uses it to get the `PersistentVolume` backing the claim. The volume is then mounted to the host and into the Pod.
|
||||
Pods access storage by using the claim as a volume. Claims must exist in the same namespace as the Pod using the claim. The cluster finds the claim in the Pod's namespace and uses it to get the PersistentVolume backing the claim. The volume is then mounted to the host and into the Pod.
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
|
@ -550,30 +566,28 @@ spec:
|
|||
|
||||
### A Note on Namespaces
|
||||
|
||||
`PersistentVolumes` binds are exclusive, and since `PersistentVolumeClaims` are namespaced objects, mounting claims with "Many" modes (`ROX`, `RWX`) is only possible within one namespace.
|
||||
PersistentVolumes binds are exclusive, and since PersistentVolumeClaims are namespaced objects, mounting claims with "Many" modes (`ROX`, `RWX`) is only possible within one namespace.
|
||||
|
||||
## Raw Block Volume Support
|
||||
|
||||
{{< feature-state for_k8s_version="v1.13" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.18" state="stable" >}}
|
||||
|
||||
The following volume plugins support raw block volumes, including dynamic provisioning where
|
||||
applicable:
|
||||
|
||||
* AWSElasticBlockStore
|
||||
* AzureDisk
|
||||
* CSI
|
||||
* FC (Fibre Channel)
|
||||
* GCEPersistentDisk
|
||||
* iSCSI
|
||||
* Local volume
|
||||
* OpenStack Cinder
|
||||
* RBD (Ceph Block Device)
|
||||
* VsphereVolume (alpha)
|
||||
* VsphereVolume
|
||||
|
||||
{{< note >}}
|
||||
Only FC and iSCSI volumes supported raw block volumes in Kubernetes 1.9.
|
||||
Support for the additional plugins was added in 1.10.
|
||||
{{< /note >}}
|
||||
### PersistentVolume using a Raw Block Volume {#persistent-volume-using-a-raw-block-volume}
|
||||
|
||||
### Persistent Volumes using a Raw Block Volume
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolume
|
||||
|
@ -591,7 +605,8 @@ spec:
|
|||
lun: 0
|
||||
readOnly: false
|
||||
```
|
||||
### Persistent Volume Claim requesting a Raw Block Volume
|
||||
### PersistentVolumeClaim requesting a Raw Block Volume {#persistent-volume-claim-requesting-a-raw-block-volume}
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
|
@ -605,7 +620,9 @@ spec:
|
|||
requests:
|
||||
storage: 10Gi
|
||||
```
|
||||
|
||||
### Pod specification adding Raw Block Device path in container
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
|
@ -632,7 +649,7 @@ When adding a raw block device for a Pod, you specify the device path in the con
|
|||
|
||||
### Binding Block Volumes
|
||||
|
||||
If a user requests a raw block volume by indicating this using the `volumeMode` field in the `PersistentVolumeClaim` spec, the binding rules differ slightly from previous releases that didn't consider this mode as part of the spec.
|
||||
If a user requests a raw block volume by indicating this using the `volumeMode` field in the PersistentVolumeClaim spec, the binding rules differ slightly from previous releases that didn't consider this mode as part of the spec.
|
||||
Listed is a table of possible combinations the user and admin might specify for requesting a raw block device. The table indicates if the volume will be bound or not given the combinations:
|
||||
Volume binding matrix for statically provisioned volumes:
|
||||
|
||||
|
@ -654,14 +671,15 @@ Only statically provisioned volumes are supported for alpha release. Administrat
|
|||
|
||||
## Volume Snapshot and Restore Volume from Snapshot Support
|
||||
|
||||
{{< feature-state for_k8s_version="v1.12" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.17" state="beta" >}}
|
||||
|
||||
Volume snapshot feature was added to support CSI Volume Plugins only. For details, see [volume snapshots](/docs/concepts/storage/volume-snapshots/).
|
||||
|
||||
To enable support for restoring a volume from a volume snapshot data source, enable the
|
||||
`VolumeSnapshotDataSource` feature gate on the apiserver and controller-manager.
|
||||
|
||||
### Create Persistent Volume Claim from Volume Snapshot
|
||||
### Create a PersistentVolumeClaim from a Volume Snapshot {#create-persistent-volume-claim-from-volume-snapshot}
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
|
@ -682,14 +700,10 @@ spec:
|
|||
|
||||
## Volume Cloning
|
||||
|
||||
{{< feature-state for_k8s_version="v1.16" state="beta" >}}
|
||||
[Volume Cloning](/docs/concepts/storage/volume-pvc-datasource/) only available for CSI volume plugins.
|
||||
|
||||
Volume clone feature was added to support CSI Volume Plugins only. For details, see [volume cloning](/docs/concepts/storage/volume-pvc-datasource/).
|
||||
### Create PersistentVolumeClaim from an existing PVC {#create-persistent-volume-claim-from-an-existing-pvc}
|
||||
|
||||
To enable support for cloning a volume from a PVC data source, enable the
|
||||
`VolumePVCDataSource` feature gate on the apiserver and controller-manager.
|
||||
|
||||
### Create Persistent Volume Claim from an existing pvc
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
|
@ -732,5 +746,17 @@ and need persistent storage, it is recommended that you use the following patter
|
|||
dynamic storage support (in which case the user should create a matching PV)
|
||||
or the cluster has no storage system (in which case the user cannot deploy
|
||||
config requiring PVCs).
|
||||
|
||||
{{% /capture %}}
|
||||
{{% capture whatsnext %}}
|
||||
|
||||
* Learn more about [Creating a PersistentVolume](/docs/tasks/configure-pod-container/configure-persistent-volume-storage/#create-a-persistentvolume).
|
||||
* Learn more about [Creating a PersistentVolumeClaim](/docs/tasks/configure-pod-container/configure-persistent-volume-storage/#create-a-persistentvolumeclaim).
|
||||
* Read the [Persistent Storage design document](https://git.k8s.io/community/contributors/design-proposals/storage/persistent-storage.md).
|
||||
|
||||
### Reference
|
||||
|
||||
* [PersistentVolume](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#persistentvolume-v1-core)
|
||||
* [PersistentVolumeSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#persistentvolumespec-v1-core)
|
||||
* [PersistentVolumeClaim](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#persistentvolumeclaim-v1-core)
|
||||
* [PersistentVolumeClaimSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#persistentvolumeclaimspec-v1-core)
|
||||
{{% /capture %}}
|
||||
|
|
|
@ -21,7 +21,7 @@ with [volumes](/docs/concepts/storage/volumes/) and
|
|||
|
||||
## Introduction
|
||||
|
||||
A `StorageClass` provides a way for administrators to describe the "classes" of
|
||||
A StorageClass provides a way for administrators to describe the "classes" of
|
||||
storage they offer. Different classes might map to quality-of-service levels,
|
||||
or to backup policies, or to arbitrary policies determined by the cluster
|
||||
administrators. Kubernetes itself is unopinionated about what classes
|
||||
|
@ -30,18 +30,18 @@ systems.
|
|||
|
||||
## The StorageClass Resource
|
||||
|
||||
Each `StorageClass` contains the fields `provisioner`, `parameters`, and
|
||||
`reclaimPolicy`, which are used when a `PersistentVolume` belonging to the
|
||||
Each StorageClass contains the fields `provisioner`, `parameters`, and
|
||||
`reclaimPolicy`, which are used when a PersistentVolume belonging to the
|
||||
class needs to be dynamically provisioned.
|
||||
|
||||
The name of a `StorageClass` object is significant, and is how users can
|
||||
The name of a StorageClass object is significant, and is how users can
|
||||
request a particular class. Administrators set the name and other parameters
|
||||
of a class when first creating `StorageClass` objects, and the objects cannot
|
||||
of a class when first creating StorageClass objects, and the objects cannot
|
||||
be updated once they are created.
|
||||
|
||||
Administrators can specify a default `StorageClass` just for PVCs that don't
|
||||
Administrators can specify a default StorageClass just for PVCs that don't
|
||||
request any particular class to bind to: see the
|
||||
[`PersistentVolumeClaim` section](/docs/concepts/storage/persistent-volumes/#class-1)
|
||||
[PersistentVolumeClaim section](/docs/concepts/storage/persistent-volumes/#class-1)
|
||||
for details.
|
||||
|
||||
```yaml
|
||||
|
@ -61,7 +61,7 @@ volumeBindingMode: Immediate
|
|||
|
||||
### Provisioner
|
||||
|
||||
Storage classes have a provisioner that determines what volume plugin is used
|
||||
Each StorageClass has a provisioner that determines what volume plugin is used
|
||||
for provisioning PVs. This field must be specified.
|
||||
|
||||
| Volume Plugin | Internal Provisioner| Config Example |
|
||||
|
@ -104,23 +104,23 @@ vendors provide their own external provisioner.
|
|||
|
||||
### Reclaim Policy
|
||||
|
||||
Persistent Volumes that are dynamically created by a storage class will have the
|
||||
PersistentVolumes that are dynamically created by a StorageClass will have the
|
||||
reclaim policy specified in the `reclaimPolicy` field of the class, which can be
|
||||
either `Delete` or `Retain`. If no `reclaimPolicy` is specified when a
|
||||
`StorageClass` object is created, it will default to `Delete`.
|
||||
StorageClass object is created, it will default to `Delete`.
|
||||
|
||||
Persistent Volumes that are created manually and managed via a storage class will have
|
||||
PersistentVolumes that are created manually and managed via a StorageClass will have
|
||||
whatever reclaim policy they were assigned at creation.
|
||||
|
||||
### Allow Volume Expansion
|
||||
|
||||
{{< feature-state for_k8s_version="v1.11" state="beta" >}}
|
||||
|
||||
Persistent Volumes can be configured to be expandable. This feature when set to `true`,
|
||||
PersistentVolumes can be configured to be expandable. This feature when set to `true`,
|
||||
allows the users to resize the volume by editing the corresponding PVC object.
|
||||
|
||||
The following types of volumes support volume expansion, when the underlying
|
||||
Storage Class has the field `allowVolumeExpansion` set to true.
|
||||
StorageClass has the field `allowVolumeExpansion` set to true.
|
||||
|
||||
{{< table caption = "Table of Volume types and the version of Kubernetes they require" >}}
|
||||
|
||||
|
@ -146,7 +146,7 @@ You can only use the volume expansion feature to grow a Volume, not to shrink it
|
|||
|
||||
### Mount Options
|
||||
|
||||
Persistent Volumes that are dynamically created by a storage class will have the
|
||||
PersistentVolumes that are dynamically created by a StorageClass will have the
|
||||
mount options specified in the `mountOptions` field of the class.
|
||||
|
||||
If the volume plugin does not support mount options but mount options are
|
||||
|
@ -219,7 +219,7 @@ allowedTopologies:
|
|||
|
||||
## Parameters
|
||||
|
||||
Storage classes have parameters that describe volumes belonging to the storage
|
||||
Storage Classes have parameters that describe volumes belonging to the storage
|
||||
class. Different parameters may be accepted depending on the `provisioner`. For
|
||||
example, the value `io1`, for the parameter `type`, and the parameter
|
||||
`iopsPerGB` are specific to EBS. When a parameter is omitted, some default is
|
||||
|
@ -350,7 +350,7 @@ parameters:
|
|||
contains user password to use when talking to Gluster REST service. These
|
||||
parameters are optional, empty password will be used when both
|
||||
`secretNamespace` and `secretName` are omitted. The provided secret must have
|
||||
type `"kubernetes.io/glusterfs"`, e.g. created in this way:
|
||||
type `"kubernetes.io/glusterfs"`, for example created in this way:
|
||||
|
||||
```
|
||||
kubectl create secret generic heketi-secret \
|
||||
|
@ -367,7 +367,7 @@ parameters:
|
|||
`"8452344e2becec931ece4e33c4674e4e,42982310de6c63381718ccfa6d8cf397"`. This
|
||||
is an optional parameter.
|
||||
* `gidMin`, `gidMax` : The minimum and maximum value of GID range for the
|
||||
storage class. A unique value (GID) in this range ( gidMin-gidMax ) will be
|
||||
StorageClass. A unique value (GID) in this range ( gidMin-gidMax ) will be
|
||||
used for dynamically provisioned volumes. These are optional values. If not
|
||||
specified, the volume will be provisioned with a value between 2000-2147483647
|
||||
which are defaults for gidMin and gidMax respectively.
|
||||
|
@ -441,7 +441,7 @@ This internal provisioner of OpenStack is deprecated. Please use [the external c
|
|||
```
|
||||
|
||||
`datastore`: The user can also specify the datastore in the StorageClass.
|
||||
The volume will be created on the datastore specified in the storage class,
|
||||
The volume will be created on the datastore specified in the StorageClass,
|
||||
which in this case is `VSANDatastore`. This field is optional. If the
|
||||
datastore is not specified, then the volume will be created on the datastore
|
||||
specified in the vSphere config file used to initialize the vSphere Cloud
|
||||
|
@ -514,7 +514,7 @@ parameters:
|
|||
same as `adminId`.
|
||||
* `userSecretName`: The name of Ceph Secret for `userId` to map RBD image. It
|
||||
must exist in the same namespace as PVCs. This parameter is required.
|
||||
The provided secret must have type "kubernetes.io/rbd", e.g. created in this
|
||||
The provided secret must have type "kubernetes.io/rbd", for example created in this
|
||||
way:
|
||||
|
||||
```shell
|
||||
|
@ -561,7 +561,7 @@ parameters:
|
|||
* `adminSecretName`: secret that holds information about the Quobyte user and
|
||||
the password to authenticate against the API server. The provided secret
|
||||
must have type "kubernetes.io/quobyte" and the keys `user` and `password`,
|
||||
e.g. created in this way:
|
||||
for example:
|
||||
|
||||
```shell
|
||||
kubectl create secret generic quobyte-admin-secret \
|
||||
|
@ -580,7 +580,7 @@ parameters:
|
|||
|
||||
### Azure Disk
|
||||
|
||||
#### Azure Unmanaged Disk Storage Class
|
||||
#### Azure Unmanaged Disk storage class {#azure-unmanaged-disk-storage-class}
|
||||
|
||||
```yaml
|
||||
apiVersion: storage.k8s.io/v1
|
||||
|
@ -601,7 +601,7 @@ parameters:
|
|||
ignored. If a storage account is not provided, a new storage account will be
|
||||
created in the same resource group as the cluster.
|
||||
|
||||
#### New Azure Disk Storage Class (starting from v1.7.2)
|
||||
#### Azure Disk storage class (starting from v1.7.2) {#azure-disk-storage-class}
|
||||
|
||||
```yaml
|
||||
apiVersion: storage.k8s.io/v1
|
||||
|
|
|
@ -11,7 +11,6 @@ weight: 30
|
|||
|
||||
{{% capture overview %}}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.16" state="beta" >}}
|
||||
This document describes the concept of cloning existing CSI Volumes in Kubernetes. Familiarity with [Volumes](/docs/concepts/storage/volumes) is suggested.
|
||||
|
||||
{{% /capture %}}
|
||||
|
@ -36,6 +35,7 @@ Users need to be aware of the following when using this feature:
|
|||
* Cloning is only supported within the same Storage Class.
|
||||
- Destination volume must be the same storage class as the source
|
||||
- Default storage class can be used and storageClassName omitted in the spec
|
||||
* Cloning can only be performed between two volumes that use the same VolumeMode setting (if you request a block mode volume, the source MUST also be block mode)
|
||||
|
||||
|
||||
## Provisioning
|
||||
|
@ -60,6 +60,10 @@ spec:
|
|||
name: pvc-1
|
||||
```
|
||||
|
||||
{{< note >}}
|
||||
You must specify a capacity value for `spec.resources.requests.storage`, and the value you specify must be the same or larger than the capacity of the source volume.
|
||||
{{< /note >}}
|
||||
|
||||
The result is a new PVC with the name `clone-of-pvc-1` that has the exact same content as the specified source `pvc-1`.
|
||||
|
||||
## Usage
|
||||
|
|
|
@ -29,7 +29,7 @@ A `VolumeSnapshotContent` is a snapshot taken from a volume in the cluster that
|
|||
|
||||
A `VolumeSnapshot` is a request for snapshot of a volume by a user. It is similar to a PersistentVolumeClaim.
|
||||
|
||||
`VolumeSnapshotClass` allows you to specify different attributes belonging to a `VolumeSnapshot`. These attibutes may differ among snapshots taken from the same volume on the storage system and therefore cannot be expressed by using the same `StorageClass` of a `PersistentVolumeClaim`.
|
||||
`VolumeSnapshotClass` allows you to specify different attributes belonging to a `VolumeSnapshot`. These attributes may differ among snapshots taken from the same volume on the storage system and therefore cannot be expressed by using the same `StorageClass` of a `PersistentVolumeClaim`.
|
||||
|
||||
Users need to be aware of the following when using this feature:
|
||||
|
||||
|
|
|
@ -605,6 +605,38 @@ spec:
|
|||
type: Directory
|
||||
```
|
||||
|
||||
{{< caution >}}
|
||||
It should be noted that the `FileOrCreate` mode does not create the parent directory of the file. If the parent directory of the mounted file does not exist, the pod fails to start. To ensure that this mode works, you can try to mount directories and files separately, as shown below.
|
||||
{{< /caution >}}
|
||||
|
||||
#### Example Pod FileOrCreate
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: test-webserver
|
||||
spec:
|
||||
containers:
|
||||
- name: test-webserver
|
||||
image: k8s.gcr.io/test-webserver:latest
|
||||
volumeMounts:
|
||||
- mountPath: /var/local/aaa
|
||||
name: mydir
|
||||
- mountPath: /var/local/aaa/1.txt
|
||||
name: myfile
|
||||
volumes:
|
||||
- name: mydir
|
||||
hostPath:
|
||||
# Ensure the file directory is created.
|
||||
path: /var/local/aaa
|
||||
type: DirectoryOrCreate
|
||||
- name: myfile
|
||||
hostPath:
|
||||
path: /var/local/aaa/1.txt
|
||||
type: FileOrCreate
|
||||
```
|
||||
|
||||
### iscsi {#iscsi}
|
||||
|
||||
An `iscsi` volume allows an existing iSCSI (SCSI over IP) volume to be mounted
|
||||
|
@ -1302,19 +1334,13 @@ persistent volume:
|
|||
|
||||
#### CSI raw block volume support
|
||||
|
||||
{{< feature-state for_k8s_version="v1.14" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.18" state="stable" >}}
|
||||
|
||||
Starting with version 1.11, CSI introduced support for raw block volumes, which
|
||||
relies on the raw block volume feature that was introduced in a previous version of
|
||||
Kubernetes. This feature will make it possible for vendors with external CSI drivers to
|
||||
implement raw block volumes support in Kubernetes workloads.
|
||||
Vendors with external CSI drivers can implement raw block volumes support
|
||||
in Kubernetes workloads.
|
||||
|
||||
CSI block volume support is feature-gated, but enabled by default. The two
|
||||
feature gates which must be enabled for this feature are `BlockVolume` and
|
||||
`CSIBlockVolume`.
|
||||
|
||||
Learn how to
|
||||
[setup your PV/PVC with raw block volume support](/docs/concepts/storage/persistent-volumes/#raw-block-volume-support).
|
||||
You can [setup your PV/PVC with raw block volume support](/docs/concepts/storage/persistent-volumes/#raw-block-volume-support)
|
||||
as usual, without any CSI specific changes.
|
||||
|
||||
#### CSI ephemeral volumes
|
||||
|
||||
|
|
|
@ -18,11 +18,12 @@ One CronJob object is like one line of a _crontab_ (cron table) file. It runs a
|
|||
on a given schedule, written in [Cron](https://en.wikipedia.org/wiki/Cron) format.
|
||||
|
||||
{{< note >}}
|
||||
All **CronJob** `schedule:` times are based on the timezone of the master where the job is initiated.
|
||||
All **CronJob** `schedule:` times are denoted in UTC.
|
||||
{{< /note >}}
|
||||
|
||||
When creating the manifest for a CronJob resource, make sure the name you provide
|
||||
is no longer than 52 characters. This is because the CronJob controller will automatically
|
||||
is a valid [DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
The name must be no longer than 52 characters. This is because the CronJob controller will automatically
|
||||
append 11 characters to the job name provided and there is a constraint that the
|
||||
maximum length of a Job name is no more than 63 characters.
|
||||
|
||||
|
|
|
@ -19,8 +19,8 @@ collected. Deleting a DaemonSet will clean up the Pods it created.
|
|||
Some typical uses of a DaemonSet are:
|
||||
|
||||
- running a cluster storage daemon, such as `glusterd`, `ceph`, on each node.
|
||||
- running a logs collection daemon on every node, such as `fluentd` or `logstash`.
|
||||
- running a node monitoring daemon on every node, such as [Prometheus Node Exporter](https://github.com/prometheus/node_exporter), [Flowmill](https://github.com/Flowmill/flowmill-k8s/), [Sysdig Agent](https://docs.sysdig.com), `collectd`, [Dynatrace OneAgent](https://www.dynatrace.com/technologies/kubernetes-monitoring/), [AppDynamics Agent](https://docs.appdynamics.com/display/CLOUD/Container+Visibility+with+Kubernetes), [Datadog agent](https://docs.datadoghq.com/agent/kubernetes/daemonset_setup/), [New Relic agent](https://docs.newrelic.com/docs/integrations/kubernetes-integration/installation/kubernetes-installation-configuration), Ganglia `gmond` or [Instana Agent](https://www.instana.com/supported-integrations/kubernetes-monitoring/).
|
||||
- running a logs collection daemon on every node, such as `fluentd` or `filebeat`.
|
||||
- running a node monitoring daemon on every node, such as [Prometheus Node Exporter](https://github.com/prometheus/node_exporter), [Flowmill](https://github.com/Flowmill/flowmill-k8s/), [Sysdig Agent](https://docs.sysdig.com), `collectd`, [Dynatrace OneAgent](https://www.dynatrace.com/technologies/kubernetes-monitoring/), [AppDynamics Agent](https://docs.appdynamics.com/display/CLOUD/Container+Visibility+with+Kubernetes), [Datadog agent](https://docs.datadoghq.com/agent/kubernetes/daemonset_setup/), [New Relic agent](https://docs.newrelic.com/docs/integrations/kubernetes-integration/installation/kubernetes-installation-configuration), Ganglia `gmond`, [Instana Agent](https://www.instana.com/supported-integrations/kubernetes-monitoring/) or [Elastic Metricbeat](https://www.elastic.co/guide/en/beats/metricbeat/current/running-on-kubernetes.html).
|
||||
|
||||
In a simple case, one DaemonSet, covering all nodes, would be used for each type of daemon.
|
||||
A more complex setup might use multiple DaemonSets for a single type of daemon, but with
|
||||
|
@ -39,7 +39,8 @@ You can describe a DaemonSet in a YAML file. For example, the `daemonset.yaml` f
|
|||
|
||||
{{< codenew file="controllers/daemonset.yaml" >}}
|
||||
|
||||
* Create a DaemonSet based on the YAML file:
|
||||
Create a DaemonSet based on the YAML file:
|
||||
|
||||
```
|
||||
kubectl apply -f https://k8s.io/examples/controllers/daemonset.yaml
|
||||
```
|
||||
|
@ -50,6 +51,9 @@ As with all other Kubernetes config, a DaemonSet needs `apiVersion`, `kind`, and
|
|||
general information about working with config files, see [deploying applications](/docs/user-guide/deploying-applications/),
|
||||
[configuring containers](/docs/tasks/), and [object management using kubectl](/docs/concepts/overview/working-with-objects/object-management/) documents.
|
||||
|
||||
The name of a DaemonSet object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
A DaemonSet also needs a [`.spec`](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status) section.
|
||||
|
||||
### Pod Template
|
||||
|
|
|
@ -64,7 +64,7 @@ In this example:
|
|||
* The Pods are labeled `app: nginx`using the `labels` field.
|
||||
* The Pod template's specification, or `.template.spec` field, indicates that
|
||||
the Pods run one container, `nginx`, which runs the `nginx`
|
||||
[Docker Hub](https://hub.docker.com/) image at version 1.7.9.
|
||||
[Docker Hub](https://hub.docker.com/) image at version 1.14.2.
|
||||
* Create one container and name it `nginx` using the `name` field.
|
||||
|
||||
Follow the steps given below to create the above Deployment:
|
||||
|
@ -153,15 +153,15 @@ is changed, for example if the labels or container images of the template are up
|
|||
|
||||
Follow the steps given below to update your Deployment:
|
||||
|
||||
1. Let's update the nginx Pods to use the `nginx:1.9.1` image instead of the `nginx:1.7.9` image.
|
||||
1. Let's update the nginx Pods to use the `nginx:1.16.1` image instead of the `nginx:1.14.2` image.
|
||||
|
||||
```shell
|
||||
kubectl --record deployment.apps/nginx-deployment set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1
|
||||
kubectl --record deployment.apps/nginx-deployment set image deployment.v1.apps/nginx-deployment nginx=nginx:1.16.1
|
||||
```
|
||||
or simply use the following command:
|
||||
|
||||
```shell
|
||||
kubectl set image deployment/nginx-deployment nginx=nginx:1.9.1 --record
|
||||
kubectl set image deployment/nginx-deployment nginx=nginx:1.16.1 --record
|
||||
```
|
||||
|
||||
The output is similar to this:
|
||||
|
@ -169,7 +169,7 @@ Follow the steps given below to update your Deployment:
|
|||
deployment.apps/nginx-deployment image updated
|
||||
```
|
||||
|
||||
Alternatively, you can `edit` the Deployment and change `.spec.template.spec.containers[0].image` from `nginx:1.7.9` to `nginx:1.9.1`:
|
||||
Alternatively, you can `edit` the Deployment and change `.spec.template.spec.containers[0].image` from `nginx:1.14.2` to `nginx:1.16.1`:
|
||||
|
||||
```shell
|
||||
kubectl edit deployment.v1.apps/nginx-deployment
|
||||
|
@ -265,7 +265,7 @@ up to 3 replicas, as well as scaling down the old ReplicaSet to 0 replicas.
|
|||
Labels: app=nginx
|
||||
Containers:
|
||||
nginx:
|
||||
Image: nginx:1.9.1
|
||||
Image: nginx:1.16.1
|
||||
Port: 80/TCP
|
||||
Environment: <none>
|
||||
Mounts: <none>
|
||||
|
@ -306,11 +306,11 @@ If you update a Deployment while an existing rollout is in progress, the Deploym
|
|||
as per the update and start scaling that up, and rolls over the ReplicaSet that it was scaling up previously
|
||||
-- it will add it to its list of old ReplicaSets and start scaling it down.
|
||||
|
||||
For example, suppose you create a Deployment to create 5 replicas of `nginx:1.7.9`,
|
||||
but then update the Deployment to create 5 replicas of `nginx:1.9.1`, when only 3
|
||||
replicas of `nginx:1.7.9` had been created. In that case, the Deployment immediately starts
|
||||
killing the 3 `nginx:1.7.9` Pods that it had created, and starts creating
|
||||
`nginx:1.9.1` Pods. It does not wait for the 5 replicas of `nginx:1.7.9` to be created
|
||||
For example, suppose you create a Deployment to create 5 replicas of `nginx:1.14.2`,
|
||||
but then update the Deployment to create 5 replicas of `nginx:1.16.1`, when only 3
|
||||
replicas of `nginx:1.14.2` had been created. In that case, the Deployment immediately starts
|
||||
killing the 3 `nginx:1.14.2` Pods that it had created, and starts creating
|
||||
`nginx:1.16.1` Pods. It does not wait for the 5 replicas of `nginx:1.14.2` to be created
|
||||
before changing course.
|
||||
|
||||
### Label selector updates
|
||||
|
@ -347,10 +347,10 @@ This means that when you roll back to an earlier revision, only the Deployment's
|
|||
rolled back.
|
||||
{{< /note >}}
|
||||
|
||||
* Suppose that you made a typo while updating the Deployment, by putting the image name as `nginx:1.91` instead of `nginx:1.9.1`:
|
||||
* Suppose that you made a typo while updating the Deployment, by putting the image name as `nginx:1.161` instead of `nginx:1.16.1`:
|
||||
|
||||
```shell
|
||||
kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.91 --record=true
|
||||
kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.161 --record=true
|
||||
```
|
||||
|
||||
The output is similar to this:
|
||||
|
@ -427,7 +427,7 @@ rolled back.
|
|||
Labels: app=nginx
|
||||
Containers:
|
||||
nginx:
|
||||
Image: nginx:1.91
|
||||
Image: nginx:1.161
|
||||
Port: 80/TCP
|
||||
Host Port: 0/TCP
|
||||
Environment: <none>
|
||||
|
@ -468,13 +468,13 @@ Follow the steps given below to check the rollout history:
|
|||
deployments "nginx-deployment"
|
||||
REVISION CHANGE-CAUSE
|
||||
1 kubectl apply --filename=https://k8s.io/examples/controllers/nginx-deployment.yaml --record=true
|
||||
2 kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1 --record=true
|
||||
3 kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.91 --record=true
|
||||
2 kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.16.1 --record=true
|
||||
3 kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.161 --record=true
|
||||
```
|
||||
|
||||
`CHANGE-CAUSE` is copied from the Deployment annotation `kubernetes.io/change-cause` to its revisions upon creation. You can specify the`CHANGE-CAUSE` message by:
|
||||
|
||||
* Annotating the Deployment with `kubectl annotate deployment.v1.apps/nginx-deployment kubernetes.io/change-cause="image updated to 1.9.1"`
|
||||
* Annotating the Deployment with `kubectl annotate deployment.v1.apps/nginx-deployment kubernetes.io/change-cause="image updated to 1.16.1"`
|
||||
* Append the `--record` flag to save the `kubectl` command that is making changes to the resource.
|
||||
* Manually editing the manifest of the resource.
|
||||
|
||||
|
@ -488,10 +488,10 @@ Follow the steps given below to check the rollout history:
|
|||
deployments "nginx-deployment" revision 2
|
||||
Labels: app=nginx
|
||||
pod-template-hash=1159050644
|
||||
Annotations: kubernetes.io/change-cause=kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1 --record=true
|
||||
Annotations: kubernetes.io/change-cause=kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.16.1 --record=true
|
||||
Containers:
|
||||
nginx:
|
||||
Image: nginx:1.9.1
|
||||
Image: nginx:1.16.1
|
||||
Port: 80/TCP
|
||||
QoS Tier:
|
||||
cpu: BestEffort
|
||||
|
@ -549,7 +549,7 @@ Follow the steps given below to rollback the Deployment from the current version
|
|||
CreationTimestamp: Sun, 02 Sep 2018 18:17:55 -0500
|
||||
Labels: app=nginx
|
||||
Annotations: deployment.kubernetes.io/revision=4
|
||||
kubernetes.io/change-cause=kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1 --record=true
|
||||
kubernetes.io/change-cause=kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.16.1 --record=true
|
||||
Selector: app=nginx
|
||||
Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable
|
||||
StrategyType: RollingUpdate
|
||||
|
@ -559,7 +559,7 @@ Follow the steps given below to rollback the Deployment from the current version
|
|||
Labels: app=nginx
|
||||
Containers:
|
||||
nginx:
|
||||
Image: nginx:1.9.1
|
||||
Image: nginx:1.16.1
|
||||
Port: 80/TCP
|
||||
Host Port: 0/TCP
|
||||
Environment: <none>
|
||||
|
@ -722,7 +722,7 @@ apply multiple fixes in between pausing and resuming without triggering unnecess
|
|||
|
||||
* Then update the image of the Deployment:
|
||||
```shell
|
||||
kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1
|
||||
kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.16.1
|
||||
```
|
||||
|
||||
The output is similar to this:
|
||||
|
@ -1020,6 +1020,8 @@ can create multiple Deployments, one for each release, following the canary patt
|
|||
As with all other Kubernetes configs, a Deployment needs `apiVersion`, `kind`, and `metadata` fields.
|
||||
For general information about working with config files, see [deploying applications](/docs/tutorials/stateless-application/run-stateless-application-deployment/),
|
||||
configuring containers, and [using kubectl to manage resources](/docs/concepts/overview/working-with-objects/object-management/) documents.
|
||||
The name of a Deployment object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
A Deployment also needs a [`.spec` section](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status).
|
||||
|
||||
|
@ -1074,7 +1076,7 @@ All existing Pods are killed before new ones are created when `.spec.strategy.ty
|
|||
|
||||
#### Rolling Update Deployment
|
||||
|
||||
The Deployment updates Pods in a [rolling update](/docs/tasks/run-application/rolling-update-replication-controller/)
|
||||
The Deployment updates Pods in a rolling update
|
||||
fashion when `.spec.strategy.type==RollingUpdate`. You can specify `maxUnavailable` and `maxSurge` to control
|
||||
the rolling update process.
|
||||
|
||||
|
@ -1141,12 +1143,4 @@ a paused Deployment and one that is not paused, is that any changes into the Pod
|
|||
Deployment will not trigger new rollouts as long as it is paused. A Deployment is not paused by default when
|
||||
it is created.
|
||||
|
||||
## Alternative to Deployments
|
||||
|
||||
### kubectl rolling-update
|
||||
|
||||
[`kubectl rolling-update`](/docs/reference/generated/kubectl/kubectl-commands#rolling-update) updates Pods and ReplicationControllers
|
||||
in a similar fashion. But Deployments are recommended, since they are declarative, server side, and have
|
||||
additional features, such as rolling back to any previous revision even after the rolling update is done.
|
||||
|
||||
{{% /capture %}}
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue