Merge branch 'master' into GH-2/minimum-requirement-content

pull/20020/head
MaxymVlasov 2020-03-31 21:21:31 +03:00
commit e91001782c
No known key found for this signature in database
GPG Key ID: EF6C9B9687D0180F
1033 changed files with 93064 additions and 15663 deletions
content
de
community/static
docs
concepts
cluster-administration
containers
reference
glossary
en
community

View File

@ -5,7 +5,7 @@ charset = utf-8
max_line_length = 80
trim_trailing_whitespace = true
[*.{html,js,json,sass,md,mmark,toml,yaml}]
[*.{css,html,js,json,sass,md,mmark,toml,yaml}]
indent_style = space
indent_size = 2

View File

@ -18,7 +18,7 @@ build: ## Build site with production settings and put deliverables in ./public
build-preview: ## Build site with drafts and future posts enabled
hugo --buildDrafts --buildFuture
deploy-preview: check-hugo-versions ## Deploy preview site via netlify
deploy-preview: ## Deploy preview site via netlify
hugo --enableGitInfo --buildFuture -b $(DEPLOY_PRIME_URL)
functions-build:
@ -27,9 +27,9 @@ functions-build:
check-headers-file:
scripts/check-headers-file.sh
production-build: check-hugo-versions build check-headers-file ## Build the production site and ensure that noindex headers aren't added
production-build: build check-headers-file ## Build the production site and ensure that noindex headers aren't added
non-production-build: check-hugo-versions ## Build the non-production site, which adds noindex headers to prevent indexing
non-production-build: ## Build the non-production site, which adds noindex headers to prevent indexing
hugo --enableGitInfo
serve: ## Boot the development server.
@ -47,6 +47,3 @@ docker-serve:
test-examples:
scripts/test_examples.sh install
scripts/test_examples.sh run
check-hugo-versions:
scripts/hugo-version-check.sh $(HUGO_VERSION)

View File

@ -30,7 +30,6 @@ aliases:
- onlydole
- parispittman
- vonguard
- onlydole
sig-docs-de-owners: # Admins for German content
- bene2k1
- mkorbi
@ -40,34 +39,33 @@ aliases:
- mkorbi
- rlenferink
sig-docs-en-owners: # Admins for English content
- bradamant3
- bradtopol
- daminisatya
- gochist
- jaredbhatti
- jimangel
- kbarnard10
- kbhawkey
- makoscafee
- onlydole
- Rajakavitha1
- ryanmcginnis
- sftim
- steveperry-53
- tengqm
- vineethreddy02
- xiangpengzhao
- zacharysarah
- zparnold
sig-docs-en-reviews: # PR reviews for English content
- bradamant3
- bradtopol
- daminisatya
- gochist
- jaredbhatti
- jimangel
- kbarnard10
- kbhawkey
- makoscafee
- onlydole
- rajakavitha1
- rajeshdeshpande02
- sftim
- steveperry-53
- tengqm
@ -130,12 +128,10 @@ aliases:
- fabriziopandini
- mattiaperi
- micheleberardi
- rlenferink
sig-docs-it-reviews: # PR reviews for Italian content
- fabriziopandini
- mattiaperi
- micheleberardi
- rlenferink
sig-docs-ja-owners: # Admins for Japanese content
- cstoku
- inductor
@ -160,7 +156,6 @@ aliases:
- seokho-son
- ysyukr
sig-docs-maintainers: # Website maintainers
- bradamant3
- jimangel
- kbarnard10
- pwittrock
@ -195,10 +190,12 @@ aliases:
- femrtnz
- jcjesus
- devlware
- jhonmike
sig-docs-pt-reviews: # PR reviews for Portugese content
- femrtnz
- jcjesus
- devlware
- jhonmike
sig-docs-vi-owners: # Admins for Vietnamese content
- huynguyennovem
- ngtuna

View File

@ -28,7 +28,7 @@ El método recomendado para levantar una copia local del sitio web kubernetes.io
> Para Windows, algunas otras herramientas como Make son necesarias. Puede instalarlas utilizando el gestor [Chocolatey](https://chocolatey.org). `choco install make` o siguiendo las instrucciones de [Make for Windows](http://gnuwin32.sourceforge.net/packages/make.htm).
> Si prefiere levantar el sitio web sin utilizar **Docker**, puede seguir las instrucciones disponibles en la sección [Levantando kubernetes.io en local con Hugo](#levantando-kubernetes.io-en-local-con-hugo).
> Si prefiere levantar el sitio web sin utilizar **Docker**, puede seguir las instrucciones disponibles en la sección [Levantando kubernetes.io en local con Hugo](#levantando-kubernetesio-en-local-con-hugo).
Una vez tenga Docker [configurado en su máquina](https://www.docker.com/get-started), puede construir la imagen de Docker `kubernetes-hugo` localmente ejecutando el siguiente comando en la raíz del repositorio:

View File

@ -33,7 +33,7 @@ La façon recommandée d'exécuter le site web Kubernetes localement est d'utili
> Si vous êtes sous Windows, vous aurez besoin de quelques outils supplémentaires que vous pouvez installer avec [Chocolatey](https://chocolatey.org). `choco install install make`
> Si vous préférez exécuter le site Web localement sans Docker, voir [Exécuter le site localement avec Hugo](#running-the-site-locally-using-hugo) ci-dessous.
> Si vous préférez exécuter le site Web localement sans Docker, voir [Exécuter le site localement avec Hugo](#exécuter-le-site-localement-en-utilisant-hugo) ci-dessous.
Si vous avez Docker [up and running](https://www.docker.com/get-started), construisez l'image Docker `kubernetes-hugo' localement:

View File

@ -9,7 +9,7 @@ Selamat datang! Repositori ini merupakan wadah bagi semua komponen yang dibutuhk
Pertama, kamu dapat menekan tombol **Fork** yang berada pada bagian atas layar, untuk menyalin repositori pada akun Github-mu. Salinan ini disebut sebagai **fork**. Kamu dapat menambahkan konten pada **fork** yang kamu miliki, setelah kamu merasa cukup untuk menambahkan konten yang kamu miliki dan ingin memberikan konten tersebut pada kami, kamu dapat melihat **fork** yang telah kamu buat dan membuat **pull request** untuk memberi tahu kami bahwa kamu ingin menambahkan konten yang telah kamu buat.
Setelah kamu membuat sebuah **pull request**, seorang **reviewer** akan memberikan masukan terhadap konten yang kamu sediakan serta beberapa hal yang dapat kamu lakukan apabila perbaikan diperlukan terhadap konten yang telah kamu sediakan. Sebagai seorang yang membuat **pull request**, **sudah menjadi kewajiban kamu untuk melakukan modifikasi terhadap konten yang kamu berikan sesuai dengan masukan yang diberikan oleh seorang reviewer Kubernetes**. Perlu kamu ketahui bahwa kamu dapat saja memiliki lebih dari satu orang **reviewer Kubernetes** atau dalam kasus kamu bisa saja mendapatkan **reviewer Kubernetes** yang berbeda dengan **reviewer Kubernetes** awal yang ditugaskan untuk memberikan masukan terhadap konten yang kamu sediakan. Selain itu, seorang **reviewer Kubernetes** bisa saja meminta masukan teknis dari [reviewer teknis Kubernetes](https://github.com/kubernetes/website/wiki/Tech-reviewers) jika diperlukan.
Setelah kamu membuat sebuah **pull request**, seorang **reviewer** akan memberikan masukan terhadap konten yang kamu sediakan serta beberapa hal yang dapat kamu lakukan apabila perbaikan diperlukan terhadap konten yang telah kamu sediakan. Sebagai seorang yang membuat **pull request**, **sudah menjadi kewajiban kamu untuk melakukan modifikasi terhadap konten yang kamu berikan sesuai dengan masukan yang diberikan oleh seorang reviewer Kubernetes**. Perlu kamu ketahui bahwa kamu dapat saja memiliki lebih dari satu orang **reviewer Kubernetes** atau dalam kasus kamu bisa saja mendapatkan **reviewer Kubernetes** yang berbeda dengan **reviewer Kubernetes** awal yang ditugaskan untuk memberikan masukan terhadap konten yang kamu sediakan. Selain itu, seorang **reviewer Kubernetes** bisa saja meminta masukan teknis dari [reviewer teknis Kubernetes](https://github.com/kubernetes/website/wiki/Tech-reviewers) jika diperlukan.
Untuk informasi lebih lanjut mengenai tata cara melakukan kontribusi, kamu dapat melihat tautan di bawah ini:
@ -21,11 +21,11 @@ Untuk informasi lebih lanjut mengenai tata cara melakukan kontribusi, kamu dapat
## Menjalankan Dokumentasi Kubernetes pada Mesin Lokal Kamu
Petunjuk yang disarankan untuk menjalankan Dokumentasi Kubernetes pada mesin lokal kamus adalah dengan menggunakan [Docker](https://docker.com) **image** yang memiliki **package** [Hugo](https://gohugo.io), **Hugo** sendiri merupakan generator website statis.
Petunjuk yang disarankan untuk menjalankan Dokumentasi Kubernetes pada mesin lokal kamus adalah dengan menggunakan [Docker](https://docker.com) **image** yang memiliki **package** [Hugo](https://gohugo.io), **Hugo** sendiri merupakan generator website statis.
> Jika kamu menggunakan Windows, kamu mungkin membutuhkan beberapa langkah tambahan untuk melakukan instalasi perangkat lunak yang dibutuhkan. Instalasi ini dapat dilakukan dengan menggunakan [Chocolatey](https://chocolatey.org). `choco install make`
> Jika kamu ingin menjalankan **website** tanpa menggunakan **Docker**, kamu dapat melihat tautan berikut [Petunjuk untuk menjalankan website pada mesin lokal dengan menggunakan Hugo](#petunjuk-untuk-menjalankan-website-pada-mesin-lokal-denga-menggunakan-hugo) di bagian bawah.
> Jika kamu ingin menjalankan **website** tanpa menggunakan **Docker**, kamu dapat melihat tautan berikut [Petunjuk untuk menjalankan website pada mesin lokal dengan menggunakan Hugo](#petunjuk-untuk-menjalankan-website-pada-mesin-lokal-dengan-menggunakan-hugo) di bagian bawah.
Jika kamu sudah memiliki **Docker** [yang sudah dapat digunakan](https://www.docker.com/get-started), kamu dapat melakukan **build** `kubernetes-hugo` **Docker image** secara lokal:
@ -44,7 +44,7 @@ Buka **browser** kamu ke http://localhost:1313 untuk melihat laman dokumentasi.
## Petunjuk untuk menjalankan website pada mesin lokal dengan menggunakan Hugo
Kamu dapat melihat [dokumentasi resmi Hugo](https://gohugo.io/getting-started/installing/) untuk mengetahui langkah yang diperlukan untuk melakukan instalasi **Hugo**. Pastikan kamu melakukan instalasi versi **Hugo** sesuai dengan versi yang tersedia pada **environment variable** `HUGO_VERSION` pada **file**[`netlify.toml`](netlify.toml#L9).
Kamu dapat melihat [dokumentasi resmi Hugo](https://gohugo.io/getting-started/installing/) untuk mengetahui langkah yang diperlukan untuk melakukan instalasi **Hugo**. Pastikan kamu melakukan instalasi versi **Hugo** sesuai dengan versi yang tersedia pada **environment variable** `HUGO_VERSION` pada **file**[`netlify.toml`](netlify.toml#L9).
Untuk menjalankan laman pada mesin lokal setelah instalasi **Hugo**, kamu dapat menjalankan perintah berikut:

View File

@ -21,11 +21,11 @@ Per maggiori informazioni su come contribuire alla documentazione Kubernetes, ve
## Eseguire il sito Web localmente usando Docker
Il modo consigliato per eseguire localmente il sito Web Kubernetes prevede l'utilizzo di un'immagine [Docker] (https://docker.com) inclusa nel sito e configurata con tutti i software necessari, a partire dal generatore di siti web statici [Hugo] (https://gohugo.io).
Il modo consigliato per eseguire localmente il sito Web Kubernetes prevede l'utilizzo di un'immagine [Docker](https://docker.com) inclusa nel sito e configurata con tutti i software necessari, a partire dal generatore di siti web statici [Hugo](https://gohugo.io).
> Se stai utilizzando Windows, avrai bisogno di alcuni strumenti aggiuntivi che puoi installare con [Chocolatey] (https://chocolatey.org). `choco install make`
> Se stai utilizzando Windows, avrai bisogno di alcuni strumenti aggiuntivi che puoi installare con [Chocolatey](https://chocolatey.org). `choco install make`
> Se preferisci eseguire il sito Web localmente senza Docker, vedi [Eseguire il sito Web localmente utilizzando Hugo](# running-the-site-local-using-hugo) di seguito.
> Se preferisci eseguire il sito Web localmente senza Docker, vedi [Eseguire il sito Web localmente utilizzando Hugo](#eseguire-il-sito-web-localmente-utilizzando-hugo) di seguito.
Se hai Docker [attivo e funzionante](https://www.docker.com/get-started), crea l'immagine Docker `kubernetes-hugo` localmente:

View File

@ -41,7 +41,7 @@ Zalecaną metodą uruchomienia serwisu internetowego Kubernetesa lokalnie jest u
choco install make
```
> Jeśli wolisz uruchomić serwis lokalnie bez Dockera, przeczytaj [jak uruchomić serwis lokalnie przy pomocy Hugo](#jak-uruchomić-serwis-lokalnie-przy-pomocy-hugo) poniżej.
> Jeśli wolisz uruchomić serwis lokalnie bez Dockera, przeczytaj [jak uruchomić serwis lokalnie przy pomocy Hugo](#jak-uruchomić-lokalną-kopię-strony-przy-pomocy-hugo) poniżej.
Jeśli [zainstalowałeś i uruchomiłeś](https://www.docker.com/get-started) już Dockera, zbuduj obraz `kubernetes-hugo` lokalnie:

View File

@ -34,7 +34,7 @@
> Если вы используете Windows, вам необходимо установить дополнительные инструменты через [Chocolatey](https://chocolatey.org). `choco install make`
> Если вы хотите запустить сайт локально без Docker, обратитесь к разделу [Запуск сайта с помощью Hugo](#running-the-site-locally-using-hugo) ниже на этой странице.
> Если вы хотите запустить сайт локально без Docker, обратитесь к разделу [Запуск сайта с помощью Hugo](#запуск-сайта-с-помощью-hugo) ниже на этой странице.
Когда Docker [установлен и запущен](https://www.docker.com/get-started), соберите локально Docker-образ `kubernetes-hugo`, выполнив команду в консоли:

View File

@ -26,7 +26,7 @@ Cách được đề xuất để chạy trang web Kubernetes cục bộ là dù
> Nếu bạn làm việc trên môi trường Windows, bạn sẽ cần thêm môt vài công cụ mà bạn có thể cài đặt với [Chocolatey](https://chocolatey.org). `choco install make`
> Nếu bạn không muốn dùng Docker để chạy trang web cục bộ, hãy xem [Chạy website cục bộ dùng Hugo](#Chạy website cục bộ dùng Hugo) dưới đây.
> Nếu bạn không muốn dùng Docker để chạy trang web cục bộ, hãy xem [Chạy website cục bộ dùng Hugo](#chạy-website-cục-bộ-dùng-hugo) dưới đây.
Nếu bạn có Docker đang [up và running](https://www.docker.com/get-started), build `kubernetes-hugo` Docker image cục bộ:

View File

@ -122,7 +122,7 @@ Open up your browser to http://localhost:1313 to view the website. As you make c
<!--
## Running the website locally using Hugo
-->
## 使用 Hugo 在本地运行网站
## 使用 Hugo 在本地运行网站 {#running-the-site-locally-using-hugo}
<!--
See the [official Hugo documentation](https://gohugo.io/getting-started/installing/) for Hugo installation instructions.

View File

@ -37,7 +37,7 @@ The recommended way to run the Kubernetes website locally is to run a specialize
> If you are running on Windows, you'll need a few more tools which you can install with [Chocolatey](https://chocolatey.org). `choco install make`
> If you'd prefer to run the website locally without Docker, see [Running the website locally using Hugo](#running-the-site-locally-using-hugo) below.
> If you'd prefer to run the website locally without Docker, see [Running the website locally using Hugo](#running-the-website-locally-using-hugo) below.
If you have Docker [up and running](https://www.docker.com/get-started), build the `kubernetes-hugo` Docker image locally:

View File

@ -10,6 +10,6 @@
# DO NOT REPORT SECURITY VULNERABILITIES DIRECTLY TO THESE NAMES, FOLLOW THE
# INSTRUCTIONS AT https://kubernetes.io/security/
bradamant3
jimangel
kbarnard10
zacharysarah

View File

@ -109,6 +109,7 @@ header
box-shadow: 0 0 0 transparent
transition: 0.3s
text-align: center
overflow: hidden
.logo
@ -244,8 +245,7 @@ header
background-color: white
#mainNav
display: none
h5
color: $blue
font-weight: normal
@ -578,6 +578,9 @@ section
li
display: inline-block
height: 100%
margin-right: 10px
&:last-child
margin-right: 0
a
display: block
@ -598,11 +601,11 @@ section
#vendorStrip
line-height: 44px
max-width: 100%
overflow-x: auto
-webkit-overflow-scrolling: touch
ul
float: none
overflow-x: auto
#searchBox
float: none
@ -1052,6 +1055,9 @@ dd
a.issue
margin-left: 0px
.gridPageHome .flyout-button
display: none
.feedback--no
margin-left: 1em

View File

@ -107,7 +107,7 @@ $video-section-height: 550px
padding-right: 10px
#home
section, header, footer
section, header
.main-section
max-width: 1000px
@ -178,16 +178,18 @@ $video-section-height: 550px
nav
overflow: hidden
margin-bottom: 20px
display: flex
justify-content: space-between
a
width: 16.65%
width: auto
float: left
font-size: 24px
font-weight: 300
white-space: nowrap
.social
padding: 0 30px
padding: 0
max-width: 1200px
div

View File

@ -133,18 +133,21 @@ $feature-box-div-width: 45%
max-width: 25%
max-height: 100%
transform: translateY(-50%)
width: 100%
&:nth-child(odd)
padding-right: 210px
.image-wrapper
right: 0
text-align: right
&:nth-child(even)
padding-left: 210px
.image-wrapper
left: 0
text-align: left
&:nth-child(1)
padding-right: 0
@ -219,9 +222,8 @@ $feature-box-div-width: 45%
footer
nav
text-align: center
a
width: 30%
width: auto
padding: 0 20px
.social

View File

@ -66,10 +66,10 @@ time_format_blog = "Monday, January 02, 2006"
description = "Production-Grade Container Orchestration"
showedit = true
latest = "v1.17"
latest = "v1.18"
fullversion = "v1.17.0"
version = "v1.17"
fullversion = "v1.18.0"
version = "v1.18"
githubbranch = "master"
docsbranch = "master"
deprecated = false
@ -83,12 +83,6 @@ announcement = false
# announcement_message is only displayed when announcement = true; update with your specific message
announcement_message = "The Kubernetes Documentation team would like your feedback! Please take a <a href='https://www.surveymonkey.com/r/8R237FN' target='_blank'>short survey</a> so we can improve the Kubernetes online documentation."
[[params.versions]]
fullversion = "v1.17.0"
version = "v1.17"
githubbranch = "v1.17.0"
docsbranch = "release-1.17"
url = "https://kubernetes.io"
[params.pushAssets]
css = [
@ -102,33 +96,40 @@ js = [
]
[[params.versions]]
fullversion = "v1.16.3"
fullversion = "v1.18.0"
version = "v1.18"
githubbranch = "v1.18.0"
docsbranch = "release-1.18"
url = "https://kubernetes.io"
[[params.versions]]
fullversion = "v1.17.4"
version = "v1.17"
githubbranch = "v1.17.4"
docsbranch = "release-1.17"
url = "https://v1-17.docs.kubernetes.io"
[[params.versions]]
fullversion = "v1.16.8"
version = "v1.16"
githubbranch = "v1.16.3"
githubbranch = "v1.16.8"
docsbranch = "release-1.16"
url = "https://v1-16.docs.kubernetes.io"
[[params.versions]]
fullversion = "v1.15.6"
fullversion = "v1.15.11"
version = "v1.15"
githubbranch = "v1.15.6"
githubbranch = "v1.15.11"
docsbranch = "release-1.15"
url = "https://v1-15.docs.kubernetes.io"
[[params.versions]]
fullversion = "v1.14.9"
fullversion = "v1.14.10"
version = "v1.14"
githubbranch = "v1.14.9"
githubbranch = "v1.14.10"
docsbranch = "release-1.14"
url = "https://v1-14.docs.kubernetes.io"
[[params.versions]]
fullversion = "v1.13.12"
version = "v1.13"
githubbranch = "v1.13.12"
docsbranch = "release-1.13"
url = "https://v1-13.docs.kubernetes.io"
# Language definitions.
[languages]
@ -174,7 +175,7 @@ language_alternatives = ["en"]
[languages.fr]
title = "Kubernetes"
description = "Production-Grade Container Orchestration"
description = "Solution professionnelle dorchestration de conteneurs"
languageName ="Français"
weight = 5
contentDir = "content/fr"
@ -222,7 +223,7 @@ language_alternatives = ["en"]
[languages.es]
title = "Kubernetes"
description = "Production-Grade Container Orchestration"
description = "Orquestación de contenedores para producción"
languageName ="Español"
weight = 9
contentDir = "content/es"

View File

@ -1,6 +1,6 @@
<!-- Do not edit this file directly. Get the latest from
https://github.com/cncf/foundation/blob/master/code-of-conduct.md -->
## CNCF Community Code of Conduct v1.0
## CNCF Gemeinschafts-Verhaltenskodex v1.0
### Verhaltenskodex für Mitwirkende

View File

@ -52,7 +52,7 @@ Wenn Sie beispielsweise mit der Kubernetes-API ein Deployment-Objekt erstellen,
### Kubernetes Master
Der Kubernetes-Master ist für Erhalt des gewünschten Status Ihres Clusters verantwortlich. Wenn Sie mit Kubernetes interagieren, beispielsweise mit dem Kommanduzeilen-Tool `kubectl`, kommunizieren Sie mit dem Kubernetes-Master Ihres Clusters.
Der Kubernetes-Master ist für Erhalt des gewünschten Status Ihres Clusters verantwortlich. Wenn Sie mit Kubernetes interagieren, beispielsweise mit dem Kommandozeilen-Tool `kubectl`, kommunizieren Sie mit dem Kubernetes-Master Ihres Clusters.
> Der Begriff "Master" bezeichnet dabei eine Reihe von Prozessen, die den Clusterstatus verwalten. Normalerweise werden diese Prozesse alle auf einem einzigen Node im Cluster ausgeführt. Dieser Node wird auch als Master bezeichnet. Der Master kann repliziert werden, um die Verfügbarkeit und Redundanz zu erhöhen.

View File

@ -0,0 +1,56 @@
---
title: Addons Installieren
content_template: templates/concept
---
{{% capture overview %}}
Add-Ons erweitern die Funktionalität von Kubernetes.
Diese Seite gibt eine Übersicht über einige verfügbare Add-Ons und verweist auf die entsprechenden Installationsanleitungen.
Die Add-Ons in den einzelnen Kategorien sind alphabetisch sortiert - Die Reihenfolge impliziert keine bevorzugung einzelner Projekte.
{{% /capture %}}
{{% capture body %}}
## Networking und Network Policy
* [ACI](https://www.github.com/noironetworks/aci-containers) bietet Container-Networking und Network-Security mit Cisco ACI.
* [Calico](https://docs.projectcalico.org/latest/introduction/) ist ein Networking- und Network-Policy-Provider. Calico unterstützt eine Reihe von Networking-Optionen, damit Du die richtige für deinen Use-Case wählen kannst. Dies beinhaltet Non-Overlaying and Overlaying-Networks mit oder ohne BGP. Calico nutzt die gleiche Engine um Network-Policies für Hosts, Pods und (falls Du Istio & Envoy benutzt) Anwendungen auf Service-Mesh-Ebene durchzusetzen.
* [Canal](https://github.com/tigera/canal/tree/master/k8s-install) vereint Flannel und Calico um Networking- und Network-Policies bereitzustellen.
* [Cilium](https://github.com/cilium/cilium) ist ein L3 Network- and Network-Policy-Plugin welches das transparent HTTP/API/L7-Policies durchsetzen kann. Sowohl Routing- als auch Overlay/Encapsulation-Modes werden uterstützt. Außerdem kann Cilium auf andere CNI-Plugins aufsetzen.
* [CNI-Genie](https://github.com/Huawei-PaaS/CNI-Genie) ermöglicht das nahtlose Verbinden von Kubernetes mit einer Reihe an CNI-Plugins wie z.B. Calico, Canal, Flannel, Romana, oder Weave.
* [Contiv](http://contiv.github.io) bietet konfigurierbares Networking (Native L3 auf BGP, Overlay mit vxlan, Klassisches L2, Cisco-SDN/ACI) für verschiedene Anwendungszwecke und auch umfangreiches Policy-Framework. Das Contiv-Projekt ist vollständig [Open Source](http://github.com/contiv). Der [installer](http://github.com/contiv/install) bietet sowohl kubeadm als auch nicht-kubeadm basierte Installationen.
* [Contrail](http://www.juniper.net/us/en/products-services/sdn/contrail/contrail-networking/), basierend auf [Tungsten Fabric](https://tungsten.io), ist eine Open Source, multi-Cloud Netzwerkvirtualisierungs- und Policy-Management Plattform. Contrail und Tungsten Fabric sind mit Orechstratoren wie z.B. Kubernetes, OpenShift, OpenStack und Mesos integriert und bieten Isolationsmodi für Virtuelle Maschinen, Container (bzw. Pods) und Bare Metal workloads.
* [Flannel](https://github.com/coreos/flannel/blob/master/Documentation/kubernetes.md) ist ein Overlay-Network-Provider der mit Kubernetes genutzt werden kann.
* [Knitter](https://github.com/ZTE/Knitter/) ist eine Network-Lösung die Mehrfach-Network in Kubernetes ermöglicht.
* [Multus](https://github.com/Intel-Corp/multus-cni) ist ein Multi-Plugin für Mehrfachnetzwerk-Unterstützung um alle CNI-Plugins (z.B. Calico, Cilium, Contiv, Flannel), zusätzlich zu SRIOV-, DPDK-, OVS-DPDK- und VPP-Basierten Workloads in Kubernetes zu unterstützen.
* [NSX-T](https://docs.vmware.com/en/VMware-NSX-T/2.0/nsxt_20_ncp_kubernetes.pdf) Container Plug-in (NCP) bietet eine Integration zwischen VMware NSX-T und einem Orchestator wie z.B. Kubernetes. Außerdem bietet es eine Integration zwischen NSX-T und Containerbasierten CaaS/PaaS-Plattformen wie z.B. Pivotal Container Service (PKS) und OpenShift.
* [Nuage](https://github.com/nuagenetworks/nuage-kubernetes/blob/v5.1.1-1/docs/kubernetes-1-installation.rst) ist eine SDN-Plattform die Policy-Basiertes Networking zwischen Kubernetes Pods und nicht-Kubernetes Umgebungen inklusive Sichtbarkeit und Security-Monitoring bereitstellt.
* [Romana](http://romana.io) ist eine Layer 3 Network-Lösung für Pod-Netzwerke welche auch die [NetworkPolicy API](/docs/concepts/services-networking/network-policies/) unterstützt. Details zur Installation als kubeadm Add-On sind [hier](https://github.com/romana/romana/tree/master/containerize) verfügbar.
* [Weave Net](https://www.weave.works/docs/net/latest/kube-addon/) bietet Networking and Network-Policies und arbeitet auf beiden Seiten der Network-Partition ohne auf eine externe Datenbank angwiesen zu sein.
## Service-Discovery
* [CoreDNS](https://coredns.io) ist ein flexibler, erweiterbarer DNS-Server der in einem Cluster [installiert](https://github.com/coredns/deployment/tree/master/kubernetes) werden kann und das Cluster-interne DNS für Pods bereitzustellen.
## Visualisierung &amp; Überwachung
* [Dashboard](https://github.com/kubernetes/dashboard#kubernetes-dashboard) ist ein Dashboard Web Interface für Kubernetes.
* [Weave Scope](https://www.weave.works/documentation/scope-latest-installing/#k8s) ist ein Tool um Container, Pods, Services usw. Grafisch zu visualieren. Kann in Verbindung mit einem [Weave Cloud Account](https://cloud.weave.works/) genutzt oder selbst gehosted werden.
## Infrastruktur
* [KubeVirt](https://kubevirt.io/user-guide/docs/latest/administration/intro.html#cluster-side-add-on-deployment) ist ein Add-On um Virtuelle Maschinen in Kubernetes auszuführen. Wird typischer auf Bare-Metal Clustern eingesetzt.
## Legacy Add-Ons
Es gibt einige weitere Add-Ons die in dem abgekündigten [cluster/addons](https://git.k8s.io/kubernetes/cluster/addons)-Verzeichnis dokumentiert sind.
Add-Ons die ordentlich gewartet werden dürfen gerne hier aufgezählt werden. Wir freuen uns auf PRs!
{{% /capture %}}

View File

@ -96,7 +96,7 @@ Das Google service Konto der Instanz hat einen `https://www.googleapis.com/auth/
Kubernetes eine native Unterstützung für die [Amazon Elastic Container Registry](https://aws.amazon.com/ecr/) wenn Knoten AWS EC2 Instanzen sind.
Es muss einfah nur der komplette Image Name (z.B. `ACCOUNT.dkr.ecr.REGION.amazonaws.com/imagename:tag`) in der Pod - Definition genutzt werden.
Es muss einfach nur der komplette Image Name (z.B. `ACCOUNT.dkr.ecr.REGION.amazonaws.com/imagename:tag`) in der Pod - Definition genutzt werden.
Alle Benutzer eines Clusters die Pods erstellen dürfen können dann jedes der Images in der ECR Registry zum Ausführen von Pods nutzen.

View File

@ -3,7 +3,7 @@ title: Kubernetes Dokumentation
noedit: true
cid: docsHome
layout: docsportal_home
class: gridPage
class: gridPage gridPageHome
linkTitle: "Home"
main_menu: true
weight: 10

View File

@ -15,5 +15,5 @@ tags:
<!--more-->
Halten Sie immer einen Sicherungsplan für etcds Daten für Ihren Kubernetes-Cluster bereit. Ausführliche Informationen zu etcd finden Sie in der [etcd Dokumentation](https://github.com/coreos/etcd/blob/master/Documentation/docs.md).
Halten Sie immer einen Sicherungsplan für etcds Daten für Ihren Kubernetes-Cluster bereit. Ausführliche Informationen zu etcd finden Sie in der [etcd Dokumentation](https://etcd.io/docs).

View File

@ -27,7 +27,7 @@ source <(kubectl completion bash) # Wenn Sie autocomplete in bash in der aktuell
echo "source <(kubectl completion bash)" >> ~/.bashrc # Fügen Sie der Bash-Shell dauerhaft Autocomplete hinzu.
```
Sie können auch ein Abkürzungsalias für `kubectl` verwenden, weleches auch mit Vervollständigung funktioniert:
Sie können auch ein Abkürzungsalias für `kubectl` verwenden, welches auch mit Vervollständigung funktioniert:
```bash
alias k=kubectl
@ -180,7 +180,7 @@ kubectl get events --sort-by=.metadata.creationTimestamp
## Ressourcen aktualisieren
Ab Version 1.11 ist das `rolling-update` veraltet (Lesen Sie [CHANGELOG-1.11.md](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md) für weitere Informationen), verwenden Sie stattdessen `rollout`.
Ab Version 1.11 ist das `rolling-update` veraltet (Lesen Sie [CHANGELOG-1.11.md](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.11.md) für weitere Informationen), verwenden Sie stattdessen `rollout`.
```bash
kubectl set image deployment/frontend www=image:v2 # Fortlaufende Aktualisierung der "www" Container der "Frontend"-Bereitstellung, Aktualisierung des Images

View File

@ -205,7 +205,7 @@ Weitere Informationen zu unterstützten Treibern und zur Installation von Plugin
### Lokale Images durch erneute Verwendung des Docker-Daemon ausführen
Wenn Sie eine einzige Kubernetes VM verwenden, ist es sehr praktisch, den integrierten Docker-Daemon von Minikube wiederzuverwenden; Dies bedeutet, dass Sie auf Ihrem lokalen Computer keine Docker-Registy erstellen und das Image in die Registry importortieren müssen - Sie können einfach innerhalb desselben Docker-Daemons wie Minikube arbeiten, was lokale Experimente beschleunigt. Stellen Sie einfach sicher, dass Sie Ihr Docker-Image mit einem anderen Element als 'latest' versehen, und verwenden Sie dieses Tag, wenn Sie das Image laden. Andernfalls, wenn Sie keine Version Ihres Images angeben, wird es als `:latest` angenommen, mit der Pull-Image-Richtlinie von `Always` entsprechend, was schließlich zu `ErrImagePull` führen kann, da Sie möglicherweise noch keine Versionen Ihres Docker-Images in der Standard-Docker-Registry (normalerweise DockerHub) haben.
Wenn Sie eine einzige Kubernetes VM verwenden, ist es sehr praktisch, den integrierten Docker-Daemon von Minikube wiederzuverwenden; Dies bedeutet, dass Sie auf Ihrem lokalen Computer keine Docker-Registy erstellen und das Image in die Registry importieren müssen - Sie können einfach innerhalb desselben Docker-Daemons wie Minikube arbeiten, was lokale Experimente beschleunigt. Stellen Sie einfach sicher, dass Sie Ihr Docker-Image mit einem anderen Element als 'latest' versehen, und verwenden Sie dieses Tag, wenn Sie das Image laden. Andernfalls, wenn Sie keine Version Ihres Images angeben, wird es als `:latest` angenommen, mit der Pull-Image-Richtlinie von `Always` entsprechend, was schließlich zu `ErrImagePull` führen kann, da Sie möglicherweise noch keine Versionen Ihres Docker-Images in der Standard-Docker-Registry (normalerweise DockerHub) haben.
Um mit dem Docker-Daemon auf Ihrem Mac/Linux-Computer arbeiten zu können, verwenden Sie den `docker-env`-Befehl in Ihrer Shell:

View File

@ -49,7 +49,7 @@ Minikube unterstützt auch die Option `--vm-driver=none`, mit der die Kubernetes
Die einfachste Möglichkeit, Minikube unter macOS zu installieren, ist die Verwendung von [Homebrew](https://brew.sh):
```shell
brew cask install minikube
brew install minikube
```
Sie können es auch auf macOS installieren, indem Sie eine statische Binärdatei herunterladen:

View File

@ -145,7 +145,7 @@ Um den "Hallo-Welt"-Container außerhalb des virtuellen Netzwerks von Kubernetes
```
Bei Cloud-Anbietern, die Load-Balancer unterstützen, wird eine externe IP-Adresse für den Zugriff auf den Dienst bereitgestellt.
Bei Minikube ermöglicht der Typ `LoadBalancer` den Dienst über den Befehl `minikube service` verfuügbar zu machen.
Bei Minikube ermöglicht der Typ `LoadBalancer` den Dienst über den Befehl `minikube service` verfügbar zu machen.
3. Führen Sie den folgenden Befehl aus:

View File

@ -45,12 +45,12 @@ Kubernetes is open source giving you the freedom to take advantage of on-premise
<br>
<br>
<br>
<a href="https://events.linuxfoundation.org/events/kubecon-cloudnativecon-europe-2020/" button id="desktopKCButton">Attend KubeCon in Amsterdam on Mar. 30-Apr. 2, 2020</a>
<a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/?utm_source=kubernetes.io&utm_medium=nav&utm_campaign=kccnceu20" button id="desktopKCButton">Attend KubeCon in Amsterdam on August 13-16, 2020</a>
<br>
<br>
<br>
<br>
<a href="https://events.linuxfoundation.cn/kubecon-cloudnativecon-open-source-summit-china/" button id="desktopKCButton">Attend KubeCon in Shanghai on July 28-30, 2020</a>
<a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/?utm_source=kubernetes.io&utm_medium=nav&utm_campaign=kccncna20" button id="desktopKCButton">Attend KubeCon in Boston on November 17-20, 2020</a>
</div>
<div id="videoPlayer">
<iframe data-url="https://www.youtube.com/embed/H06qrNmGqyE?autoplay=1" frameborder="0" allowfullscreen></iframe>

View File

@ -97,3 +97,11 @@ semantics to fields! It's also going to improve support for CRDs and unions!
- Some kubectl apply features are missing from diff and could be useful, like the ability
to filter by label, or to display pruned resources.
- Eventually, kubectl diff will use server-side apply!
{{< note >}}
The flag `kubectl apply --server-dry-run` is deprecated in v1.18.
Use the flag `--dry-run=server` for using server-side dry-run in
`kubectl apply` and other subcommands.
{{< /note >}}

View File

@ -52,7 +52,7 @@ In this way, admission controllers and policy management help make sure that app
To illustrate how admission controller webhooks can be leveraged to establish custom security policies, lets consider an example that addresses one of the shortcomings of Kubernetes: a lot of its defaults are optimized for ease of use and reducing friction, sometimes at the expense of security. One of these settings is that containers are by default allowed to run as root (and, without further configuration and no `USER` directive in the Dockerfile, will also do so). Even though containers are isolated from the underlying host to a certain extent, running containers as root does increase the risk profile of your deployment— and should be avoided as one of many [security best practices](https://www.stackrox.com/post/2018/12/6-container-security-best-practices-you-should-be-following/). The [recently exposed runC vulnerability](https://www.stackrox.com/post/2019/02/the-runc-vulnerability-a-deep-dive-on-protecting-yourself/) ([CVE-2019-5736](https://nvd.nist.gov/vuln/detail/CVE-2019-5736)), for example, could be exploited only if the container ran as root.
You can use a custom mutating admission controller webhook to apply more secure defaults: unless explicitly requested, our webhook will ensure that pods run as a non-root user (we assign the user ID 1234 if no explicit assignment has been made). Note that this setup does not prevent you from deploying any workloads in your cluster, including those that legitimately require running as root. It only requires you to explicitly enable this risker mode of operation in the deployment configuration, while defaulting to non-root mode for all other workloads.
You can use a custom mutating admission controller webhook to apply more secure defaults: unless explicitly requested, our webhook will ensure that pods run as a non-root user (we assign the user ID 1234 if no explicit assignment has been made). Note that this setup does not prevent you from deploying any workloads in your cluster, including those that legitimately require running as root. It only requires you to explicitly enable this riskier mode of operation in the deployment configuration, while defaulting to non-root mode for all other workloads.
The full code along with deployment instructions can be found in our accompanying [GitHub repository](https://github.com/stackrox/admission-controller-webhook-demo). Here, we will highlight a few of the more subtle aspects about how webhooks work.
@ -80,7 +80,7 @@ webhooks:
resources: ["pods"]
```
This configuration defines a `webhook webhook-server.webhook-demo.svc`, and instructs the Kubernetes API server to consult the service `webhook-server` in n`amespace webhook-demo` whenever a pod is created by making a HTTP POST request to the `/mutate` URL. For this configuration to work, several prerequisites have to be met.
This configuration defines a `webhook webhook-server.webhook-demo.svc`, and instructs the Kubernetes API server to consult the service `webhook-server` in `namespace webhook-demo` whenever a pod is created by making a HTTP POST request to the `/mutate` URL. For this configuration to work, several prerequisites have to be met.
## Webhook REST API

View File

@ -12,21 +12,45 @@ When APIs evolve, the old API is deprecated and eventually removed.
The **v1.16** release will stop serving the following deprecated API versions in favor of newer and more stable API versions:
* NetworkPolicy (in the **extensions/v1beta1** API group)
* Migrate to use the **networking.k8s.io/v1** API, available since v1.8.
Existing persisted data can be retrieved/updated via the **networking.k8s.io/v1** API.
* PodSecurityPolicy (in the **extensions/v1beta1** API group)
* NetworkPolicy in the **extensions/v1beta1** API version is no longer served
* Migrate to use the **networking.k8s.io/v1** API version, available since v1.8.
Existing persisted data can be retrieved/updated via the new version.
* PodSecurityPolicy in the **extensions/v1beta1** API version
* Migrate to use the **policy/v1beta1** API, available since v1.10.
Existing persisted data can be retrieved/updated via the **policy/v1beta1** API.
* DaemonSet, Deployment, StatefulSet, and ReplicaSet (in the **extensions/v1beta1** and **apps/v1beta2** API groups)
* Migrate to use the **apps/v1** API, available since v1.9.
Existing persisted data can be retrieved/updated via the **apps/v1** API.
Existing persisted data can be retrieved/updated via the new version.
* DaemonSet in the **extensions/v1beta1** and **apps/v1beta2** API versions is no longer served
* Migrate to use the **apps/v1** API version, available since v1.9.
Existing persisted data can be retrieved/updated via the new version.
* Notable changes:
* `spec.templateGeneration` is removed
* `spec.selector` is now required and immutable after creation; use the existing template labels as the selector for seamless upgrades
* `spec.updateStrategy.type` now defaults to `RollingUpdate` (the default in `extensions/v1beta1` was `OnDelete`)
* Deployment in the **extensions/v1beta1**, **apps/v1beta1**, and **apps/v1beta2** API versions is no longer served
* Migrate to use the **apps/v1** API version, available since v1.9.
Existing persisted data can be retrieved/updated via the new version.
* Notable changes:
* `spec.rollbackTo` is removed
* `spec.selector` is now required and immutable after creation; use the existing template labels as the selector for seamless upgrades
* `spec.progressDeadlineSeconds` now defaults to `600` seconds (the default in `extensions/v1beta1` was no deadline)
* `spec.revisionHistoryLimit` now defaults to `10` (the default in `apps/v1beta1` was `2`, the default in `extensions/v1beta1` was to retain all)
* `maxSurge` and `maxUnavailable` now default to `25%` (the default in `extensions/v1beta1` was `1`)
* StatefulSet in the **apps/v1beta1** and **apps/v1beta2** API versions is no longer served
* Migrate to use the **apps/v1** API version, available since v1.9.
Existing persisted data can be retrieved/updated via the new version.
* Notable changes:
* `spec.selector` is now required and immutable after creation; use the existing template labels as the selector for seamless upgrades
* `spec.updateStrategy.type` now defaults to `RollingUpdate` (the default in `apps/v1beta1` was `OnDelete`)
* ReplicaSet in the **extensions/v1beta1**, **apps/v1beta1**, and **apps/v1beta2** API versions is no longer served
* Migrate to use the **apps/v1** API version, available since v1.9.
Existing persisted data can be retrieved/updated via the new version.
* Notable changes:
* `spec.selector` is now required and immutable after creation; use the existing template labels as the selector for seamless upgrades
The **v1.20** release will stop serving the following deprecated API versions in favor of newer and more stable API versions:
The **v1.22** release will stop serving the following deprecated API versions in favor of newer and more stable API versions:
* Ingress (in the **extensions/v1beta1** API group)
* Migrate to use the **networking.k8s.io/v1beta1** API, serving Ingress since v1.14.
Existing persisted data can be retrieved/updated via the **networking.k8s.io/v1beta1** API.
* Ingress in the **extensions/v1beta1** API version will no longer be served
* Migrate to use the **networking.k8s.io/v1beta1** API version, available since v1.14.
Existing persisted data can be retrieved/updated via the new version.
# What To Do
@ -60,8 +84,8 @@ apiserver startup arguments:
Deprecations are announced in the Kubernetes release notes. You can see these
announcements in
[1.14](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#deprecations)
and [1.15](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#deprecations-and-removals).
[1.14](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.14.md#deprecations)
and [1.15](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.15.md#deprecations-and-removals).
You can read more [in our deprecation policy document](https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-parts-of-the-api)
about the deprecation policies for Kubernetes APIs, and other Kubernetes components.

View File

@ -186,7 +186,7 @@ metadata:
spec:
containers:
- name: nginx
image: nginx:1.13-alpine
image: nginx:1.16-alpine
ports:
- containerPort: 80
volumeMounts:

View File

@ -0,0 +1,760 @@
---
layout: blog
title: "Deploying External OpenStack Cloud Provider with Kubeadm"
date: 2020-02-07
slug: Deploying-External-OpenStack-Cloud-Provider-with-Kubeadm
---
This document describes how to install a single control-plane Kubernetes cluster v1.15 with kubeadm on CentOS, and then deploy an external OpenStack cloud provider and Cinder CSI plugin to use Cinder volumes as persistent volumes in Kubernetes.
### Preparation in OpenStack
This cluster runs on OpenStack VMs, so let's create a few things in OpenStack first.
* A project/tenant for this Kubernetes cluster
* A user in this project for Kubernetes, to query node information and attach volumes etc
* A private network and subnet
* A router for this private network and connect it to a public network for floating IPs
* A security group for all Kubernetes VMs
* A VM as a control-plane node and a few VMs as worker nodes
The security group will have the following rules to open ports for Kubernetes.
**Control-Plane Node**
|Protocol | Port Number | Description|
|----------|-------------|------------|
|TCP |6443|Kubernetes API Server|
|TCP|2379-2380|etcd server client API|
|TCP|10250|Kubelet API|
|TCP|10251|kube-scheduler|
|TCP|10252|kube-controller-manager|
|TCP|10255|Read-only Kubelet API|
**Worker Nodes**
|Protocol | Port Number | Description|
|----------|-------------|------------|
|TCP|10250|Kubelet API|
|TCP|10255|Read-only Kubelet API|
|TCP|30000-32767|NodePort Services|
**CNI ports on both control-plane and worker nodes**
|Protocol | Port Number | Description|
|----------|-------------|------------|
|TCP|179|Calico BGP network|
|TCP|9099|Calico felix (health check)|
|UDP|8285|Flannel|
|UDP|8472|Flannel|
|TCP|6781-6784|Weave Net|
|UDP|6783-6784|Weave Net|
CNI specific ports are only required to be opened when that particular CNI plugin is used. In this guide, we will use Weave Net. Only the Weave Net ports (TCP 6781-6784 and UDP 6783-6784), will need to be opened in the security group.
The control-plane node needs at least 2 cores and 4GB RAM. After the VM is launched, verify its hostname and make sure it is the same as the node name in Nova.
If the hostname is not resolvable, add it to `/etc/hosts`.
For example, if the VM is called master1, and it has an internal IP 192.168.1.4. Add that to `/etc/hosts` and set hostname to master1.
```shell
echo "192.168.1.4 master1" >> /etc/hosts
hostnamectl set-hostname master1
```
### Install Docker and Kubernetes
Next, we'll follow the official documents to install docker and Kubernetes using kubeadm.
Install Docker following the steps from the [container runtime](/docs/setup/production-environment/container-runtimes/) documentation.
Note that it is a [best practice to use systemd as the cgroup driver](/docs/setup/production-environment/container-runtimes/#cgroup-drivers) for Kubernetes.
If you use an internal container registry, add them to the docker config.
```shell
# Install Docker CE
## Set up the repository
### Install required packages.
yum install yum-utils device-mapper-persistent-data lvm2
### Add Docker repository.
yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
## Install Docker CE.
yum update && yum install docker-ce-18.06.2.ce
## Create /etc/docker directory.
mkdir /etc/docker
# Configure the Docker daemon
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
EOF
mkdir -p /etc/systemd/system/docker.service.d
# Restart Docker
systemctl daemon-reload
systemctl restart docker
systemctl enable docker
```
Install kubeadm following the steps from the [Installing Kubeadm](/docs/setup/production-environment/tools/kubeadm/install-kubeadm/) documentation.
```shell
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
# Set SELinux in permissive mode (effectively disabling it)
# Caveat: In a production environment you may not want to disable SELinux, please refer to Kubernetes documents about SELinux
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
systemctl enable --now kubelet
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system
# check if br_netfilter module is loaded
lsmod | grep br_netfilter
# if not, load it explicitly with
modprobe br_netfilter
```
The official document about how to create a single control-plane cluster can be found from the [Creating a single control-plane cluster with kubeadm](/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/) documentation.
We'll largely follow that document but also add additional things for the cloud provider.
To make things more clear, we'll use a `kubeadm-config.yml` for the control-plane node.
In this config we specify to use an external OpenStack cloud provider, and where to find its config.
We also enable storage API in API server's runtime config so we can use OpenStack volumes as persistent volumes in Kubernetes.
```yaml
apiVersion: kubeadm.k8s.io/v1beta1
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
cloud-provider: "external"
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: "v1.15.1"
apiServer:
extraArgs:
enable-admission-plugins: NodeRestriction
runtime-config: "storage.k8s.io/v1=true"
controllerManager:
extraArgs:
external-cloud-volume-plugin: openstack
extraVolumes:
- name: "cloud-config"
hostPath: "/etc/kubernetes/cloud-config"
mountPath: "/etc/kubernetes/cloud-config"
readOnly: true
pathType: File
networking:
serviceSubnet: "10.96.0.0/12"
podSubnet: "10.224.0.0/16"
dnsDomain: "cluster.local"
```
Now we'll create the cloud config, `/etc/kubernetes/cloud-config`, for OpenStack.
Note that the tenant here is the one we created for all Kubernetes VMs in the beginning.
All VMs should be launched in this project/tenant.
In addition you need to create a user in this tenant for Kubernetes to do queries.
The ca-file is the CA root certificate for OpenStack's API endpoint, for example `https://openstack.cloud:5000/v3`
At the time of writing the cloud provider doesn't allow insecure connections (skip CA check).
```ini
[Global]
region=RegionOne
username=username
password=password
auth-url=https://openstack.cloud:5000/v3
tenant-id=14ba698c0aec4fd6b7dc8c310f664009
domain-id=default
ca-file=/etc/kubernetes/ca.pem
[LoadBalancer]
subnet-id=b4a9a292-ea48-4125-9fb2-8be2628cb7a1
floating-network-id=bc8a590a-5d65-4525-98f3-f7ef29c727d5
[BlockStorage]
bs-version=v2
[Networking]
public-network-name=public
ipv6-support-disabled=false
```
Next run kubeadm to initiate the control-plane node
```shell
kubeadm init --config=kubeadm-config.yml
```
With the initialization completed, copy admin config to .kube
```shell
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
```
At this stage, the control-plane node is created but not ready. All the nodes have the taint `node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule` and are waiting to be initialized by the cloud-controller-manager.
```console
# kubectl describe no master1
Name: master1
Roles: master
......
Taints: node-role.kubernetes.io/master:NoSchedule
node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
......
```
Now deploy the OpenStack cloud controller manager into the cluster, following [using controller manager with kubeadm](https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-controller-manager-with-kubeadm.md).
Create a secret with the cloud-config for the openstack cloud provider.
```shell
kubectl create secret -n kube-system generic cloud-config --from-literal=cloud.conf="$(cat /etc/kubernetes/cloud-config)" --dry-run -o yaml > cloud-config-secret.yaml
kubectl apply -f cloud-config-secret.yaml
```
Get the CA certificate for OpenStack API endpoints and put that into `/etc/kubernetes/ca.pem`.
Create RBAC resources.
```shell
kubectl apply -f https://github.com/kubernetes/cloud-provider-openstack/raw/release-1.15/cluster/addons/rbac/cloud-controller-manager-roles.yaml
kubectl apply -f https://github.com/kubernetes/cloud-provider-openstack/raw/release-1.15/cluster/addons/rbac/cloud-controller-manager-role-bindings.yaml
```
We'll run the OpenStack cloud controller manager as a DaemonSet rather than a pod.
The manager will only run on the control-plane node, so if there are multiple control-plane nodes, multiple pods will be run for high availability.
Create `openstack-cloud-controller-manager-ds.yaml` containing the following manifests, then apply it.
```yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: cloud-controller-manager
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: openstack-cloud-controller-manager
namespace: kube-system
labels:
k8s-app: openstack-cloud-controller-manager
spec:
selector:
matchLabels:
k8s-app: openstack-cloud-controller-manager
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
k8s-app: openstack-cloud-controller-manager
spec:
nodeSelector:
node-role.kubernetes.io/master: ""
securityContext:
runAsUser: 1001
tolerations:
- key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
effect: NoSchedule
- key: node-role.kubernetes.io/master
effect: NoSchedule
- effect: NoSchedule
key: node.kubernetes.io/not-ready
serviceAccountName: cloud-controller-manager
containers:
- name: openstack-cloud-controller-manager
image: docker.io/k8scloudprovider/openstack-cloud-controller-manager:v1.15.0
args:
- /bin/openstack-cloud-controller-manager
- --v=1
- --cloud-config=$(CLOUD_CONFIG)
- --cloud-provider=openstack
- --use-service-account-credentials=true
- --address=127.0.0.1
volumeMounts:
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /etc/config
name: cloud-config-volume
readOnly: true
- mountPath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
name: flexvolume-dir
- mountPath: /etc/kubernetes
name: ca-cert
readOnly: true
resources:
requests:
cpu: 200m
env:
- name: CLOUD_CONFIG
value: /etc/config/cloud.conf
hostNetwork: true
volumes:
- hostPath:
path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
type: DirectoryOrCreate
name: flexvolume-dir
- hostPath:
path: /etc/kubernetes/pki
type: DirectoryOrCreate
name: k8s-certs
- hostPath:
path: /etc/ssl/certs
type: DirectoryOrCreate
name: ca-certs
- name: cloud-config-volume
secret:
secretName: cloud-config
- name: ca-cert
secret:
secretName: openstack-ca-cert
```
When the controller manager is running, it will query OpenStack to get information about the nodes and remove the taint. In the node info you'll see the VM's UUID in OpenStack.
```console
# kubectl describe no master1
Name: master1
Roles: master
......
Taints: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
......
sage:docker: network plugin is not ready: cni config uninitialized
......
PodCIDR: 10.224.0.0/24
ProviderID: openstack:///548e3c46-2477-4ce2-968b-3de1314560a5
```
Now install your favourite CNI and the control-plane node will become ready.
For example, to install Weave Net, run this command:
```shell
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
```
Next we'll set up worker nodes.
Firstly, install docker and kubeadm in the same way as how they were installed in the control-plane node.
To join them to the cluster we need a token and ca cert hash from the output of control-plane node installation.
If it is expired or lost we can recreate it using these commands.
```shell
# check if token is expired
kubeadm token list
# re-create token and show join command
kubeadm token create --print-join-command
```
Create `kubeadm-config.yml` for worker nodes with the above token and ca cert hash.
```yaml
apiVersion: kubeadm.k8s.io/v1beta2
discovery:
bootstrapToken:
apiServerEndpoint: 192.168.1.7:6443
token: 0c0z4p.dnafh6vnmouus569
caCertHashes: ["sha256:fcb3e956a6880c05fc9d09714424b827f57a6fdc8afc44497180905946527adf"]
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
cloud-provider: "external"
```
apiServerEndpoint is the control-plane node, token and caCertHashes can be taken from the join command printed in the output of 'kubeadm token create' command.
Run kubeadm and the worker nodes will be joined to the cluster.
```shell
kubeadm join --config kubeadm-config.yml
```
At this stage we'll have a working Kubernetes cluster with an external OpenStack cloud provider.
The provider tells Kubernetes about the mapping between Kubernetes nodes and OpenStack VMs.
If Kubernetes wants to attach a persistent volume to a pod, it can find out which OpenStack VM the pod is running on from the mapping, and attach the underlying OpenStack volume to the VM accordingly.
### Deploy Cinder CSI
The integration with Cinder is provided by an external Cinder CSI plugin, as described in the [Cinder CSI](https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-cinder-csi-plugin.md) documentation.
We'll perform the following steps to install the Cinder CSI plugin.
Firstly, create a secret with CA certs for OpenStack's API endpoints. It is the same cert file as what we use in cloud provider above.
```shell
kubectl create secret -n kube-system generic openstack-ca-cert --from-literal=ca.pem="$(cat /etc/kubernetes/ca.pem)" --dry-run -o yaml > openstack-ca-cert.yaml
kubectl apply -f openstack-ca-cert.yaml
```
Then create RBAC resources.
```shell
kubectl apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-openstack/release-1.15/manifests/cinder-csi-plugin/cinder-csi-controllerplugin-rbac.yaml
kubectl apply -f https://github.com/kubernetes/cloud-provider-openstack/raw/release-1.15/manifests/cinder-csi-plugin/cinder-csi-nodeplugin-rbac.yaml
```
The Cinder CSI plugin includes a controller plugin and a node plugin.
The controller communicates with Kubernetes APIs and Cinder APIs to create/attach/detach/delete Cinder volumes. The node plugin in-turn runs on each worker node to bind a storage device (attached volume) to a pod, and unbind it during deletion.
Create `cinder-csi-controllerplugin.yaml` and apply it to create csi controller.
```yaml
kind: Service
apiVersion: v1
metadata:
name: csi-cinder-controller-service
namespace: kube-system
labels:
app: csi-cinder-controllerplugin
spec:
selector:
app: csi-cinder-controllerplugin
ports:
- name: dummy
port: 12345
---
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: csi-cinder-controllerplugin
namespace: kube-system
spec:
serviceName: "csi-cinder-controller-service"
replicas: 1
selector:
matchLabels:
app: csi-cinder-controllerplugin
template:
metadata:
labels:
app: csi-cinder-controllerplugin
spec:
serviceAccount: csi-cinder-controller-sa
containers:
- name: csi-attacher
image: quay.io/k8scsi/csi-attacher:v1.0.1
args:
- "--v=5"
- "--csi-address=$(ADDRESS)"
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
imagePullPolicy: "IfNotPresent"
volumeMounts:
- name: socket-dir
mountPath: /var/lib/csi/sockets/pluginproxy/
- name: csi-provisioner
image: quay.io/k8scsi/csi-provisioner:v1.0.1
args:
- "--provisioner=csi-cinderplugin"
- "--csi-address=$(ADDRESS)"
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
imagePullPolicy: "IfNotPresent"
volumeMounts:
- name: socket-dir
mountPath: /var/lib/csi/sockets/pluginproxy/
- name: csi-snapshotter
image: quay.io/k8scsi/csi-snapshotter:v1.0.1
args:
- "--connection-timeout=15s"
- "--csi-address=$(ADDRESS)"
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
imagePullPolicy: Always
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy/
name: socket-dir
- name: cinder-csi-plugin
image: docker.io/k8scloudprovider/cinder-csi-plugin:v1.15.0
args :
- /bin/cinder-csi-plugin
- "--v=5"
- "--nodeid=$(NODE_ID)"
- "--endpoint=$(CSI_ENDPOINT)"
- "--cloud-config=$(CLOUD_CONFIG)"
- "--cluster=$(CLUSTER_NAME)"
env:
- name: NODE_ID
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: CSI_ENDPOINT
value: unix://csi/csi.sock
- name: CLOUD_CONFIG
value: /etc/config/cloud.conf
- name: CLUSTER_NAME
value: kubernetes
imagePullPolicy: "IfNotPresent"
volumeMounts:
- name: socket-dir
mountPath: /csi
- name: secret-cinderplugin
mountPath: /etc/config
readOnly: true
- mountPath: /etc/kubernetes
name: ca-cert
readOnly: true
volumes:
- name: socket-dir
hostPath:
path: /var/lib/csi/sockets/pluginproxy/
type: DirectoryOrCreate
- name: secret-cinderplugin
secret:
secretName: cloud-config
- name: ca-cert
secret:
secretName: openstack-ca-cert
```
Create `cinder-csi-nodeplugin.yaml` and apply it to create csi node.
```yaml
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: csi-cinder-nodeplugin
namespace: kube-system
spec:
selector:
matchLabels:
app: csi-cinder-nodeplugin
template:
metadata:
labels:
app: csi-cinder-nodeplugin
spec:
serviceAccount: csi-cinder-node-sa
hostNetwork: true
containers:
- name: node-driver-registrar
image: quay.io/k8scsi/csi-node-driver-registrar:v1.1.0
args:
- "--v=5"
- "--csi-address=$(ADDRESS)"
- "--kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)"
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "rm -rf /registration/cinder.csi.openstack.org /registration/cinder.csi.openstack.org-reg.sock"]
env:
- name: ADDRESS
value: /csi/csi.sock
- name: DRIVER_REG_SOCK_PATH
value: /var/lib/kubelet/plugins/cinder.csi.openstack.org/csi.sock
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
imagePullPolicy: "IfNotPresent"
volumeMounts:
- name: socket-dir
mountPath: /csi
- name: registration-dir
mountPath: /registration
- name: cinder-csi-plugin
securityContext:
privileged: true
capabilities:
add: ["SYS_ADMIN"]
allowPrivilegeEscalation: true
image: docker.io/k8scloudprovider/cinder-csi-plugin:v1.15.0
args :
- /bin/cinder-csi-plugin
- "--nodeid=$(NODE_ID)"
- "--endpoint=$(CSI_ENDPOINT)"
- "--cloud-config=$(CLOUD_CONFIG)"
env:
- name: NODE_ID
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: CSI_ENDPOINT
value: unix://csi/csi.sock
- name: CLOUD_CONFIG
value: /etc/config/cloud.conf
imagePullPolicy: "IfNotPresent"
volumeMounts:
- name: socket-dir
mountPath: /csi
- name: pods-mount-dir
mountPath: /var/lib/kubelet/pods
mountPropagation: "Bidirectional"
- name: kubelet-dir
mountPath: /var/lib/kubelet
mountPropagation: "Bidirectional"
- name: pods-cloud-data
mountPath: /var/lib/cloud/data
readOnly: true
- name: pods-probe-dir
mountPath: /dev
mountPropagation: "HostToContainer"
- name: secret-cinderplugin
mountPath: /etc/config
readOnly: true
- mountPath: /etc/kubernetes
name: ca-cert
readOnly: true
volumes:
- name: socket-dir
hostPath:
path: /var/lib/kubelet/plugins/cinder.csi.openstack.org
type: DirectoryOrCreate
- name: registration-dir
hostPath:
path: /var/lib/kubelet/plugins_registry/
type: Directory
- name: kubelet-dir
hostPath:
path: /var/lib/kubelet
type: Directory
- name: pods-mount-dir
hostPath:
path: /var/lib/kubelet/pods
type: Directory
- name: pods-cloud-data
hostPath:
path: /var/lib/cloud/data
type: Directory
- name: pods-probe-dir
hostPath:
path: /dev
type: Directory
- name: secret-cinderplugin
secret:
secretName: cloud-config
- name: ca-cert
secret:
secretName: openstack-ca-cert
```
When they are both running, create a storage class for Cinder.
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: csi-sc-cinderplugin
provisioner: csi-cinderplugin
```
Then we can create a PVC with this class.
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myvol
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: csi-sc-cinderplugin
```
When the PVC is created, a Cinder volume is created correspondingly.
```console
# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
myvol Bound pvc-14b8bc68-6c4c-4dc6-ad79-4cb29a81faad 1Gi RWO csi-sc-cinderplugin 3s
```
In OpenStack the volume name will match the Kubernetes persistent volume generated name. In this example it would be: _pvc-14b8bc68-6c4c-4dc6-ad79-4cb29a81faad_
Now we can create a pod with the PVC.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: web
spec:
containers:
- name: web
image: nginx
ports:
- name: web
containerPort: 80
hostPort: 8081
protocol: TCP
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: mypd
volumes:
- name: mypd
persistentVolumeClaim:
claimName: myvol
```
When the pod is running, the volume will be attached to the pod.
If we go back to OpenStack, we can see the Cinder volume is mounted to the worker node where the pod is running on.
```console
# openstack volume show 6b5f3296-b0eb-40cd-bd4f-2067a0d6287f
+--------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+--------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| attachments | [{u'server_id': u'1c5e1439-edfa-40ed-91fe-2a0e12bc7eb4', u'attachment_id': u'11a15b30-5c24-41d4-86d9-d92823983a32', u'attached_at': u'2019-07-24T05:02:34.000000', u'host_name': u'compute-6', u'volume_id': u'6b5f3296-b0eb-40cd-bd4f-2067a0d6287f', u'device': u'/dev/vdb', u'id': u'6b5f3296-b0eb-40cd-bd4f-2067a0d6287f'}] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2019-07-24T05:02:18.000000 |
| description | Created by OpenStack Cinder CSI driver |
| encrypted | False |
| id | 6b5f3296-b0eb-40cd-bd4f-2067a0d6287f |
| migration_status | None |
| multiattach | False |
| name | pvc-14b8bc68-6c4c-4dc6-ad79-4cb29a81faad |
| os-vol-host-attr:host | rbd:volumes@rbd#rbd |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 14ba698c0aec4fd6b7dc8c310f664009 |
| properties | attached_mode='rw', cinder.csi.openstack.org/cluster='kubernetes' |
| replication_status | None |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | in-use |
| type | rbd |
| updated_at | 2019-07-24T05:02:35.000000 |
| user_id | 5f6a7a06f4e3456c890130d56babf591 |
+--------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
### Summary
In this walk-through, we deployed a Kubernetes cluster on OpenStack VMs and integrated it with OpenStack using an external OpenStack cloud provider. Then on this Kubernetes cluster we deployed Cinder CSI plugin which can create Cinder volumes and expose them in Kubernetes as persistent volumes.

View File

@ -0,0 +1,45 @@
---
layout: blog
title: "Contributor Summit Amsterdam Schedule Announced"
date: 2020-02-18
slug: Contributor-Summit-Amsterdam-Schedule-Announced
---
**Authors:** Jeffrey Sica (Red Hat), Amanda Katona (VMware)
tl;dr [Registration is open](https://events.linuxfoundation.org/kubernetes-contributor-summit-europe/) and the [schedule is live](https://kcseu2020.sched.com/) so register now and well see you in Amsterdam!
## Kubernetes Contributor Summit
**Sunday, March 29, 2020**
- Evening Contributor Celebration:
[ZuidPool](https://www.zuid-pool.nl/en/)
- Address: [Europaplein 22, 1078 GZ Amsterdam, Netherlands](https://www.google.com/search?q=KubeCon+Amsterdam+2020&ie=UTF-8&ibp=htl;events&rciv=evn&sa=X&ved=2ahUKEwiZoLvQ0dvnAhVST6wKHScBBZ8Q5bwDMAB6BAgSEAE#)
- Time: 18:00 - 21:00
**Monday, March 30, 2020**
- All Day Contributor Summit:
- [Amsterdam RAI](https://www.rai.nl/en/)
- Address: [Europaplein 24, 1078 GZ Amsterdam, Netherlands](https://www.google.com/search?q=kubecon+amsterdam+2020&oq=kubecon+amste&aqs=chrome.0.35i39j69i57j0l4j69i61l2.3957j1j4&sourceid=chrome&ie=UTF-8&ibp=htl;events&rciv=evn&sa=X&ved=2ahUKEwiZoLvQ0dvnAhVST6wKHScBBZ8Q5bwDMAB6BAgSEAE#)
- Time: 09:00 - 17:00 (Breakfast at 08:00)
![Contributor Summit](/images/blog/2020-02-18-Contributor-Summit-Amsterdam-Schedule-Announced/contribsummit.jpg)
Hello everyone and Happy 2020! Its hard to believe that KubeCon EU 2020 is less than six weeks away, and with that another contributor summit! This year we have the pleasure of being in Amsterdam in early spring, so be sure to pack some warmer clothing. This summit looks to be exciting with a lot of fantastic community-driven content. We received **26** submissions from the CFP. From that, the events team selected **12** sessions. Each of the sessions falls into one of four categories:
* Community
* Contributor Improvement
* Sustainability
* In-depth Technical
On top of the presentations, there will be a dedicated Docs Sprint as well as the New Contributor Workshop 101 and 201 Sessions. All told, we will have five separate rooms of content throughout the day on Monday. Please **[see the full schedule](https://kcseu2020.sched.com/)** to see what sessions youd be interested in. We hope between the content provided and the inevitable hallway track, everyone has a fun and enriching experience.
Speaking of fun, the social Sunday night should be a blast! Were hosting this summits social close to the conference center, at [ZuidPool](https://www.zuid-pool.nl/en/). There will be games, bingo, and unconference sign-up throughout the evening. It should be a relaxed way to kick off the week.
[Registration is open](https://events.linuxfoundation.org/kubernetes-contributor-summit-europe/)! Space is limited so its always a good idea to register early.
If you have any questions, reach out to the [Amsterdam Team](https://github.com/kubernetes/community/tree/master/events/2020/03-contributor-summit#team) on Slack in the [#contributor-summit](https://kubernetes.slack.com/archives/C7J893413) channel.
Hope to see you there!

View File

@ -0,0 +1,82 @@
---
title: Bring your ideas to the world with kubectl plugins
date: 2020-02-28
---
**Author:** Cornelius Weig (TNG Technology Consulting GmbH)
`kubectl` is the most critical tool to interact with Kubernetes and has to address multiple user personas, each with their own needs and opinions.
One way to make `kubectl` do what you need is to build new functionality into `kubectl`.
## Challenges with building commands into `kubectl`
However, that's easier said than done. Being such an important cornerstone of
Kubernetes, any meaningful change to `kubectl` needs to undergo a Kubernetes
Enhancement Proposal (KEP) where the intended change is discussed beforehand.
When it comes to implementation, you'll find that `kubectl` is an ingenious and
complex piece of engineering. It might take a long time to get used to
the processes and style of the codebase to get done what you want to achieve. Next
comes the review process which may go through several rounds until it meets all
the requirements of the Kubernetes maintainers -- after all, they need to take
over ownership of this feature and maintain it from the day it's merged.
When everything goes well, you can finally rejoice. Your code will be shipped
with the next Kubernetes release. Well, that could mean you need to wait
another 3 months to ship your idea in `kubectl` if you are unlucky.
So this was the happy path where everything goes well. But there are good
reasons why your new functionality may never make it into `kubectl`. For one,
`kubectl` has a particular look and feel and violating that style will not be
acceptable by the maintainers. For example, an interactive command that
produces output with colors would be inconsistent with the rest of `kubectl`.
Also, when it comes to tools or commands useful only to a minuscule proportion
of users, the maintainers may simply reject your proposal as `kubectl` needs to
address common needs.
But this doesnt mean you cant ship your ideas to `kubectl` users.
## What if you didnt have to change `kubectl` to add functionality?
This is where `kubectl` [plugins](https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/) shine.
Since `kubectl` v1.12, you can simply
drop executables into your `PATH`, which follows the naming pattern
`kubectl-myplugin`. Then you can execute this plugin as `kubectl myplugin`, and
it will just feel like a normal sub-command of `kubectl`.
Plugins give you the opportunity to try out new experiences like terminal UIs,
colorful output, specialized functionality, or other innovative ideas. You can
go creative, as youre the owner of your own plugin.
Further, plugins offer safe experimentation space for commands youd like to
propose to `kubectl`. By pre-releasing as a plugin, you can push your
functionality faster to the end-users and quickly gather feedback. For example,
the [kubectl-debug](https://github.com/verb/kubectl-debug) plugin is proposed
to become a built-in command in `kubectl` in a
[KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-cli/20190805-kubectl-debug.md)).
In the meanwhile, the plugin author can ship the functionality and collect
feedback using the plugin mechanism.
## How to get started with developing plugins
If you already have an idea for a plugin, how do you best make it happen?
First you have to ask yourself if you can implement it as a wrapper around
existing `kubectl` functionality. If so, writing the plugin as a shell script
is often the best way forward, because the resulting plugin will be small,
works cross-platform, and has a high level of trust because it is not
compiled.
On the other hand, if the plugin logic is complex, a general-purpose language
is usually better. The canonical choice here is Go, because you can use the
excellent `client-go` library to interact with the Kubernetes API. The Kubernetes
maintained [sample-cli-plugin](https://github.com/kubernetes/sample-cli-plugin)
demonstrates some best practices and can be used as a template for new plugin
projects.
When the development is done, you just need to ship your plugin to the
Kubernetes users. For the best plugin installation experience and discoverability,
you should consider doing so via the
[krew](https://github.com/kubernetes-sigs/krew) plugin manager. For an in-depth
discussion about the technical details around `kubectl` plugins, refer to the
documentation on [kubernetes.io](https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/).

View File

@ -0,0 +1,15 @@
---
layout: blog
title: Contributor Summit Amsterdam Postponed
date: 2020-03-04
slug: Contributor-Summit-Delayed
---
**Authors:** Dawn Foster (VMware), Jorge Castro (VMware)
The CNCF has announced that [KubeCon + CloudNativeCon EU has been delayed](https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/attend/novel-coronavirus-update/) until July/August of 2020. As a result the Contributor Summit planning team is weighing options for how to proceed. Heres the current plan:
- There will be an in-person Contributor Summit as planned when KubeCon + CloudNativeCon is rescheduled.
- We are looking at options for having additional virtual contributor activities in the meantime.
We will communicate via this blog and the usual communications channels on the final plan. Please bear with us as we adapt when we get more information. Thank you for being patient as the team pivots to bring you a great Contributor Summit!

View File

@ -0,0 +1,42 @@
---
layout: blog
title: Join SIG Scalability and Learn Kubernetes the Hard Way
date: 2020-03-19
slug: join-sig-scalability
---
**Authors:** Alex Handy
Contributing to SIG Scalability is a great way to learn Kubernetes in all its depth and breadth, and the team would love to have you [join as a contributor](https://github.com/kubernetes/community/tree/master/sig-scalability#scalability-special-interest-group). I took a look at the value of learning the hard way and interviewed the current SIG chairs to give you an idea of what contribution feels like.
## The value of Learning The Hard Way
There is a belief in the software development community that pushes for the most challenging and rigorous possible method of learning a new language or system. These tend to go by the moniker of "Learn \_\_ the Hard Way." Examples abound: Learn Code the Hard Way, Learn Python the Hard Way, and many others originating with Zed Shaw's courses in the topic.
While there are folks out there who offer you a "Learn Kubernetes the Hard Way" type experience (most notably [Kelsey Hightower's](https://github.com/kelseyhightower/kubernetes-the-hard-way)), any "Hard Way" project should attempt to cover every aspect of the core topic's principles.
Therefore, the real way to "Learn Kubernetes the Hard Way," is to join the CNCF and get involved in the project itself. And there is only one SIG that could genuinely offer a full-stack learning experience for Kubernetes: SIG Scalability.
The team behind SIG Scalability is responsible for detecting and dealing with issues that arise when Kubernetes clusters are working with upwards of a thousand nodes. Said [Wojiciech Tyczynski](https://github.com/wojtek-t), a staff software engineer at Google and a member of SIG Scalability, the standard size for a test cluster for this SIG is over 5,000 nodes.
And yet, this SIG is not composed of Ph.D.'s in highly scalable systems designs. Many of the folks working with Tyczynski, for example, joined the SIG knowing very little about these types of issues, and often, very little about Kubernetes.
Working on SIG Scalability is like jumping into the deep end of the pool to learn to swim, and the SIG is inherently concerned with the entire Kubernetes project. SIG Scalability focuses on how Kubernetes functions as a whole and at scale. The SIG Scalability team members have an impetus to learn about every system and to understand how all systems interact with one another.
## A complex and rewarding contributor experience
While that may sound complicated (and it is!), that doesn't mean it's outside the reach of an average developer, tester, or administrator. Google software developer Matt Matejczyk has only been on the team since the beginning of 2019, and he's been a valued member of the team since then, ferreting out bugs.
"I am new here," said Matejczyk. "I joined the team in January [2019]. Before that, I worked on AdWords at Google in New York. Why did I join? I knew some people there, so that was one of the decisions for me to move. I thought at that time that Kubernetes is a unique, cutting edge technology. I thought it'd be cool to work on that."
Matejczyk was correct about the coolness. "It's cool," he said. "So actually, ramping up on scalability is not easy. There are many things you need to understand. You need to understand Kubernetes very well. It can use every part of Kubernetes. I am still ramping up after these 8 months. I think it took me maybe 3 months to get up to decent speed."
When Matejczyk spoke to what he had worked on during those 8 months, he answered, "An interesting example is a regression I have been working on recently. We noticed the overall slowness of Kubernetes control plane in specific scenarios, and we couldn't attribute it to any particular component. In the end, we realized that everything boiled down to the memory allocation on the golang level. It was very counterintuitive to have two completely separate pieces of code (running as a part of the same binary) affecting the performance of each other only because one of them was allocating memory too fast. But connecting all the dots and getting to the bottom of regression like this gives great satisfaction."
Tyczynski said that "It's not only debugging regressions, but it's also debugging and finding bottlenecks. In general, those can be regressions, but those can be things we can improve. The other significant area is extending what we want to guarantee to users. Extending SLA and SLO coverage of the system so users can rely on what they can expect from the system in terms of performance and scalability. Matt is doing much work in extending our tests to be more representative and cover more Kubernetes concepts."
## Give SIG Scalability a try
The SIG Scalability team is always in need of new members, and if you're the sort of developer or tester who loves taking on new complex challenges, and perhaps loves learning things the hard way, consider joining this SIG. As the team points out, adding Kubernetes expertise to your resume is never a bad idea, and this is the one SIG where you can learn it all from top to bottom.
See [the SIG's documentation](https://github.com/kubernetes/community/tree/master/sig-scalability#scalability-special-interest-group) to learn about upcoming meetings, its charter, and more. You can also join the [#sig-scalability Slack channel](https://kubernetes.slack.com/archives/C09QZTRH7) to see what it's like. We hope to see you join in to take advantage of this great opportunity to learn Kubernetes and contribute back at the same time.

View File

@ -0,0 +1,135 @@
---
layout: blog
title: 'Kubernetes 1.18: Fit & Finish'
date: 2020-03-25
slug: kubernetes-1-18-release-announcement
---
**Authors:** [Kubernetes 1.18 Release Team](https://github.com/kubernetes/sig-release/blob/master/releases/release-1.18/release_team.md)
We're pleased to announce the delivery of Kubernetes 1.18, our first release of 2020! Kubernetes 1.18 consists of 38 enhancements: 15 enhancements are moving to stable, 11 enhancements in beta, and 12 enhancements in alpha.
Kubernetes 1.18 is a "fit and finish" release. Significant work has gone into improving beta and stable features to ensure users have a better experience. An equal effort has gone into adding new developments and exciting new features that promise to enhance the user experience even more.
Having almost as many enhancements in alpha, beta, and stable is a great achievement. It shows the tremendous effort made by the community on improving the reliability of Kubernetes as well as continuing to expand its existing functionality.
## Major Themes
### Kubernetes Topology Manager Moves to Beta - Align Up!
A beta feature of Kubernetes in release 1.18, the [Topology Manager feature](https://github.com/nolancon/website/blob/f4200307260ea3234540ef13ed80de325e1a7267/content/en/docs/tasks/administer-cluster/topology-manager.md) enables NUMA alignment of CPU and devices (such as SR-IOV VFs) that will allow your workload to run in an environment optimized for low-latency. Prior to the introduction of the Topology Manager, the CPU and Device Manager would make resource allocation decisions independent of each other. This could result in undesirable allocations on multi-socket systems, causing degraded performance on latency critical applications.
### Serverside Apply Introduces Beta 2
Server-side Apply was promoted to Beta in 1.16, but is now introducing a second Beta in 1.18. This new version will track and manage changes to fields of all new Kubernetes objects, allowing you to know what changed your resources and when.
### Extending Ingress with and replacing a deprecated annotation with IngressClass
In Kubernetes 1.18, there are two significant additions to Ingress: A new `pathType` field and a new `IngressClass` resource. The `pathType` field allows specifying how paths should be matched. In addition to the default `ImplementationSpecific` type, there are new `Exact` and `Prefix` path types.
The `IngressClass` resource is used to describe a type of Ingress within a Kubernetes cluster. Ingresses can specify the class they are associated with by using a new `ingressClassName` field on Ingresses. This new resource and field replace the deprecated `kubernetes.io/ingress.class` annotation.
### SIG-CLI introduces kubectl alpha debug
SIG-CLI was debating the need for a debug utility for quite some time already. With the development of [ephemeral containers](https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/), it became more obvious how we can support developers with tooling built on top of `kubectl exec`. The addition of the [`kubectl alpha debug` command](https://github.com/kubernetes/enhancements/blob/master/keps/sig-cli/20190805-kubectl-debug.md) (it is alpha but your feedback is more than welcome), allows developers to easily debug their Pods inside the cluster. We think this addition is invaluable. This command allows one to create a temporary container which runs next to the Pod one is trying to examine, but also attaches to the console for interactive troubleshooting.
### Introducing Windows CSI support alpha for Kubernetes
The alpha version of CSI Proxy for Windows is being released with Kubernetes 1.18. CSI proxy enables CSI Drivers on Windows by allowing containers in Windows to perform privileged storage operations.
## Other Updates
### Graduated to Stable 💯
- [Taint Based Eviction](https://github.com/kubernetes/enhancements/issues/166)
- [`kubectl diff`](https://github.com/kubernetes/enhancements/issues/491)
- [CSI Block storage support](https://github.com/kubernetes/enhancements/issues/565)
- [API Server dry run](https://github.com/kubernetes/enhancements/issues/576)
- [Pass Pod information in CSI calls](https://github.com/kubernetes/enhancements/issues/603)
- [Support Out-of-Tree vSphere Cloud Provider](https://github.com/kubernetes/enhancements/issues/670)
- [Support GMSA for Windows workloads](https://github.com/kubernetes/enhancements/issues/689)
- [Skip attach for non-attachable CSI volumes](https://github.com/kubernetes/enhancements/issues/770)
- [PVC cloning](https://github.com/kubernetes/enhancements/issues/989)
- [Moving kubectl package code to staging](https://github.com/kubernetes/enhancements/issues/1020)
- [RunAsUserName for Windows](https://github.com/kubernetes/enhancements/issues/1043)
- [AppProtocol for Services and Endpoints](https://github.com/kubernetes/enhancements/issues/1507)
- [Extending Hugepage Feature](https://github.com/kubernetes/enhancements/issues/1539)
- [client-go signature refactor to standardize options and context handling](https://github.com/kubernetes/enhancements/issues/1601)
- [Node-local DNS cache](https://github.com/kubernetes/enhancements/issues/1024)
### Major Changes
- [EndpointSlice API](https://github.com/kubernetes/enhancements/issues/752)
- [Moving kubectl package code to staging](https://github.com/kubernetes/enhancements/issues/1020)
- [CertificateSigningRequest API](https://github.com/kubernetes/enhancements/issues/1513)
- [Extending Hugepage Feature](https://github.com/kubernetes/enhancements/issues/1539)
- [client-go signature refactor to standardize options and context handling](https://github.com/kubernetes/enhancements/issues/1601)
### Release Notes
Check out the full details of the Kubernetes 1.18 release in our [release notes](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md).
### Availability
Kubernetes 1.18 is available for download on [GitHub](https://github.com/kubernetes/kubernetes/releases/tag/v1.18.0). To get started with Kubernetes, check out these [interactive tutorials](https://kubernetes.io/docs/tutorials/) or run local Kubernetes clusters using Docker container “nodes” with [kind](https://kind.sigs.k8s.io/). You can also easily install 1.18 using [kubeadm](https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/).
### Release Team
This release is made possible through the efforts of hundreds of individuals who contributed both technical and non-technical content. Special thanks to the [release team](https://github.com/kubernetes/sig-release/blob/master/releases/release-1.18/release_team.md) led by Jorge Alarcon Ochoa, Site Reliability Engineer at Searchable AI. The 34 release team members coordinated many aspects of the release, from documentation to testing, validation, and feature completeness.
As the Kubernetes community has grown, our release process represents an amazing demonstration of collaboration in open source software development. Kubernetes continues to gain new users at a rapid pace. This growth creates a positive feedback cycle where more contributors commit code creating a more vibrant ecosystem. Kubernetes has had over [40,000 individual contributors](https://k8s.devstats.cncf.io/d/24/overall-project-statistics?orgId=1) to date and an active community of more than 3,000 people.
### Release Logo
![Kubernetes 1.18 Release Logo](/images/blog/2020-03-25-kubernetes-1.18-release-announcement/release-logo.png)
#### Why the LHC?
The LHC is the worlds largest and most powerful particle accelerator. It is the result of the collaboration of thousands of scientists from around the world, all for the advancement of science. In a similar manner, Kubernetes has been a project that has united thousands of contributors from hundreds of organizations all to work towards the same goal of improving cloud computing in all aspects! "A Bit Quarky" as the release name is meant to remind us that unconventional ideas can bring about great change and keeping an open mind to diversity will lead help us innovate.
#### About the designer
Maru Lango is a designer currently based in Mexico City. While her area of expertise is Product Design, she also enjoys branding, illustration and visual experiments using CSS + JS and contributing to diversity efforts within the tech and design communities. You may find her in most social media as @marulango or check her website: https://marulango.com
### User Highlights
- Ericsson is using Kubernetes and other cloud native technology to deliver a [highly demanding 5G network](https://www.cncf.io/case-study/ericsson/) that resulted in up to 90 percent CI/CD savings.
- Zendesk is using Kubernetes to [run around 70% of its existing applications](https://www.cncf.io/case-study/zendesk/). Its also building all new applications to also run on Kubernetes, which has brought time savings, greater flexibility, and increased velocity to its application development.
- LifeMiles has [reduced infrastructure spending by 50%](https://www.cncf.io/case-study/lifemiles/) because of its move to Kubernetes. It has also allowed them to double its available resource capacity.
### Ecosystem Updates
- The CNCF published the results of its [annual survey](https://www.cncf.io/blog/2020/03/04/2019-cncf-survey-results-are-here-deployments-are-growing-in-size-and-speed-as-cloud-native-adoption-becomes-mainstream/) showing that Kubernetes usage in production is skyrocketing. The survey found that 78% of respondents are using Kubernetes in production compared to 58% last year.
- The “Introduction to Kubernetes” course hosted by the CNCF [surpassed 100,000 registrations](https://www.cncf.io/announcement/2020/01/28/cloud-native-computing-foundation-announces-introduction-to-kubernetes-course-surpasses-100000-registrations/).
### Project Velocity
The CNCF has continued refining DevStats, an ambitious project to visualize the myriad contributions that go into the project. [K8s DevStats](https://k8s.devstats.cncf.io/d/12/dashboards?orgId=1) illustrates the breakdown of contributions from major company contributors, as well as an impressive set of preconfigured reports on everything from individual contributors to pull request lifecycle times.
This past quarter, 641 different companies and over 6,409 individuals contributed to Kubernetes. [Check out DevStats](https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&var-period=m&var-repogroup_name=All) to learn more about the overall velocity of the Kubernetes project and community.
### Event Update
Kubecon + CloudNativeCon EU 2020 is being pushed back for the more most up-to-date information, please check the [Novel Coronavirus Update page](https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/attend/novel-coronavirus-update/).
### Upcoming Release Webinar
Join members of the Kubernetes 1.18 release team on April 23rd, 2020 to learn about the major features in this release including kubectl debug, Topography Manager, Ingress to V1 graduation, and client-go. Register here: https://www.cncf.io/webinars/kubernetes-1-18/.
### Get Involved
The simplest way to get involved with Kubernetes is by joining one of the many [Special Interest Groups](https://github.com/kubernetes/community/blob/master/sig-list.md) (SIGs) that align with your interests. Have something youd like to broadcast to the Kubernetes community? Share your voice at our weekly [community meeting](https://github.com/kubernetes/community/tree/master/communication), and through the channels below. Thank you for your continued feedback and support.
- Follow us on Twitter [@Kubernetesio](https://twitter.com/kubernetesio) for latest updates
- Join the community discussion on [Discuss](https://discuss.kubernetes.io/)
- Join the community on [Slack](http://slack.k8s.io/)
- Post questions (or answer questions) on [Stack Overflow](http://stackoverflow.com/questions/tagged/kubernetes)
- Share your Kubernetes [story](https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform)
- Read more about whats happening with Kubernetes on the [blog](https://kubernetes.io/blog/)
- Learn more about the [Kubernetes Release Team](https://github.com/kubernetes/sig-release/tree/master/release-team)

View File

@ -0,0 +1,503 @@
---
layout: blog
title: "Kubernetes Topology Manager Moves to Beta - Align Up!"
date: 2020-04-01
slug: kubernetes-1-18-feature-topoloy-manager-beta
---
**Authors:** Kevin Klues (NVIDIA), Victor Pickard (Red Hat), Conor Nolan (Intel)
This blog post describes the **<code>TopologyManager</code>**, a beta feature of Kubernetes in release 1.18. The **<code>TopologyManager</code>** feature enables NUMA alignment of CPUs and peripheral devices (such as SR-IOV VFs and GPUs), allowing your workload to run in an environment optimized for low-latency.
Prior to the introduction of the **<code>TopologyManager</code>**, the CPU and Device Manager would make resource allocation decisions independent of each other. This could result in undesirable allocations on multi-socket systems, causing degraded performance on latency critical applications. With the introduction of the **<code>TopologyManager</code>**, we now have a way to avoid this.
This blog post covers:
1. A brief introduction to NUMA and why it is important
1. The policies available to end-users to ensure NUMA alignment of CPUs and devices
1. The internal details of how the **<code>TopologyManager</code>** works
1. Current limitations of the **<code>TopologyManager</code>**
1. Future directions of the **<code>TopologyManager</code>**
## So, what is NUMA and why do I care?
The term NUMA stands for Non-Uniform Memory Access. It is a technology available on multi-cpu systems that allows different CPUs to access different parts of memory at different speeds. Any memory directly connected to a CPU is considered "local" to that CPU and can be accessed very fast. Any memory not directly connected to a CPU is considered "non-local" and will have variable access times depending on how many interconnects must be passed through in order to reach it. On modern systems, the idea of having "local" vs. "non-local" memory can also be extended to peripheral devices such as NICs or GPUs. For high performance, CPUs and devices should be allocated such that they have access to the same local memory.
All memory on a NUMA system is divided into a set of "NUMA nodes", with each node representing the local memory for a set of CPUs or devices. We talk about an individual CPU as being part of a NUMA node if its local memory is associated with that NUMA node.
We talk about a peripheral device as being part of a NUMA node based on the shortest number of interconnects that must be passed through in order to reach it.
For example, in Figure 1, CPUs 0-3 are said to be part of NUMA node 0, whereas CPUs 4-7 are part of NUMA node 1. Likewise GPU 0 and NIC 0 are said to be part of NUMA node 0 because they are attached to Socket 0, whose CPUs are all part of NUMA node 0. The same is true for GPU 1 and NIC 1 on NUMA node 1.
<p align="center">
<img height="300" src="/images/blog/2020-03-25-kubernetes-1.18-release-announcement/example-numa-system.png">
</p>
**Figure 1:** An example system with 2 NUMA nodes, 2 Sockets with 4 CPUs each, 2 GPUs, and 2 NICs. CPUs on Socket 0, GPU 0, and NIC 0 are all part of NUMA node 0. CPUs on Socket 1, GPU 1, and NIC 1 are all part of NUMA node 1.
Although the example above shows a 1-1 mapping of NUMA Node to Socket, this is not necessarily true in the general case. There may be multiple sockets on a single NUMA node, or individual CPUs of a single socket may be connected to different NUMA nodes. Moreover, emerging technologies such as Sub-NUMA Clustering ([available on recent intel CPUs](https://software.intel.com/en-us/articles/intel-xeon-processor-scalable-family-technical-overview)) allow single CPUs to be associated with multiple NUMA nodes so long as their memory access times to both nodes are the same (or have a negligible difference).
The **<code>TopologyManager</code>** has been built to handle all of these scenarios.
## Align Up! It's a TeaM Effort!
As previously stated, the **<code>TopologyManager</code>** allows users to align their CPU and peripheral device allocations by NUMA node. There are several policies available for this:
* **<code>none:</code>** this policy will not attempt to do any alignment of resources. It will act the same as if the **<code>TopologyManager</code>** were not present at all. This is the default policy.
* **<code>best-effort:</code>** with this policy, the **<code>TopologyManager</code>** will attempt to align allocations on NUMA nodes as best it can, but will always allow the pod to start even if some of the allocated resources are not aligned on the same NUMA node.
* **<code>restricted:</code>** this policy is the same as the **<code>best-effort</code>** policy, except it will fail pod admission if allocated resources cannot be aligned properly. Unlike with the **<code>single-numa-node</code>** policy, some allocations may come from multiple NUMA nodes if it is impossible to _ever_ satisfy the allocation request on a single NUMA node (e.g. 2 devices are requested and the only 2 devices on the system are on different NUMA nodes).
* **<code>single-numa-node:</code>** this policy is the most restrictive and will only allow a pod to be admitted if _all_ requested CPUs and devices can be allocated from exactly one NUMA node.
It is important to note that the selected policy is applied to each container in a pod spec individually, rather than aligning resources across all containers together.
Moreover, a single policy is applied to _all_ pods on a node via a global **<code>kubelet</code>** flag, rather than allowing users to select different policies on a pod-by-pod basis (or a container-by-container basis). We hope to relax this restriction in the future.
The **<code>kubelet</code>** flag to set one of these policies can be seen below:
```
--topology-manager-policy=
[none | best-effort | restricted | single-numa-node]
```
Additionally, the **<code>TopologyManager</code>** is protected by a feature gate. This feature gate has been available since Kubernetes 1.16, but has only been enabled by default since 1.18.
The feature gate can be enabled or disabled as follows (as described in more detail [here](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/)):
```
--feature-gates="...,TopologyManager=<true|false>"
```
In order to trigger alignment according to the selected policy, a user must request CPUs and peripheral devices in their pod spec, according to a certain set of requirements.
For peripheral devices, this means requesting devices from the available resources provided by a device plugin (e.g. **<code>intel.com/sriov</code>**, **<code>nvidia.com/gpu</code>**, etc.). This will only work if the device plugin has been extended to integrate properly with the **<code>TopologyManager</code>**. Currently, the only plugins known to have this extension are the [Nvidia GPU device plugin](https://github.com/NVIDIA/k8s-device-plugin/blob/5cb45d52afdf5798a40f8d0de049bce77f689865/nvidia.go#L74), and the [Intel SRIOV network device plugin](https://github.com/intel/sriov-network-device-plugin/blob/30e33f1ce2fc7b45721b6de8c8207e65dbf2d508/pkg/resources/pciNetDevice.go#L80). Details on how to extend a device plugin to integrate with the **<code>TopologyManager</code>** can be found [here](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#device-plugin-integration-with-the-topology-manager).
For CPUs, this requires that the **<code>CPUManager</code>** has been configured with its **<code>--static</code>** policy enabled and that the pod is running in the Guaranteed QoS class (i.e. all CPU and memory **<code>limits</code>** are equal to their respective CPU and memory **<code>requests</code>**). CPUs must also be requested in whole number values (e.g. **<code>1</code>**, **<code>2</code>**, **<code>1000m</code>**, etc). Details on how to set the **<code>CPUManager</code>** policy can be found [here](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#cpu-management-policies).
For example, assuming the **<code>CPUManager</code>** is running with its **<code>--static</code>** policy enabled and the device plugins for **<code>gpu-vendor.com</code>**, and **<code>nic-vendor.com</code>** have been extended to integrate with the **<code>TopologyManager</code>** properly, the pod spec below is sufficient to trigger the **<code>TopologyManager</code>** to run its selected policy:
```
spec:
containers:
- name: numa-aligned-container
image: alpine
resources:
limits:
cpu: 2
memory: 200Mi
gpu-vendor.com/gpu: 1
nic-vendor.com/nic: 1
```
Following Figure 1 from the previous section, this would result in one of the following aligned allocations:
```
{cpu: {0, 1}, gpu: 0, nic: 0}
{cpu: {0, 2}, gpu: 0, nic: 0}
{cpu: {0, 3}, gpu: 0, nic: 0}
{cpu: {1, 2}, gpu: 0, nic: 0}
{cpu: {1, 3}, gpu: 0, nic: 0}
{cpu: {2, 3}, gpu: 0, nic: 0}
{cpu: {4, 5}, gpu: 1, nic: 1}
{cpu: {4, 6}, gpu: 1, nic: 1}
{cpu: {4, 7}, gpu: 1, nic: 1}
{cpu: {5, 6}, gpu: 1, nic: 1}
{cpu: {5, 7}, gpu: 1, nic: 1}
{cpu: {6, 7}, gpu: 1, nic: 1}
```
And thats it! Just follow this pattern to have the **<code>TopologyManager</code>** ensure NUMA alignment across containers that request topology-aware devices and exclusive CPUs.
**NOTE:** if a pod is rejected by one of the **<code>TopologyManager</code>** policies, it will be placed in a **<code>Terminated</code>** state with a pod admission error and a reason of "**<code>TopologyAffinityError</code>**". Once a pod is in this state, the Kubernetes scheduler will not attempt to reschedule it. It is therefore recommended to use a [**<code>Deployment</code>**](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#creating-a-deployment) with replicas to trigger a redeploy of the pod on such a failure. An [external control loop](https://kubernetes.io/docs/concepts/architecture/controller/) can also be implemented to trigger a redeployment of pods that have a **<code>TopologyAffinityError</code>**.
## This is great, so how does it work under the hood?
Pseudocode for the primary logic carried out by the **<code>TopologyManager</code>** can be seen below:
```
for container := range append(InitContainers, Containers...) {
for provider := range HintProviders {
hints += provider.GetTopologyHints(container)
}
bestHint := policy.Merge(hints)
for provider := range HintProviders {
provider.Allocate(container, bestHint)
}
}
```
The following diagram summarizes the steps taken during this loop:
<p align="center">
<img weight="200" height="200" src="/images/blog/2020-03-25-kubernetes-1.18-release-announcement/numa-steps-during-loop.png">
</p>
The steps themselves are:
1. Loop over all containers in a pod.
1. For each container, gather "**<code>TopologyHints</code>**" from a set of "**<code>HintProviders</code>**" for each topology-aware resource type requested by the container (e.g. **<code>gpu-vendor.com/gpu</code>**, **<code>nic-vendor.com/nic</code>**, **<code>cpu</code>**, etc.).
1. Using the selected policy, merge the gathered **<code>TopologyHints</code>** to find the "best" hint that aligns resource allocations across all resource types.
1. Loop back over the set of hint providers, instructing them to allocate the resources they control using the merged hint as a guide.
1. This loop runs at pod admission time and will fail to admit the pod if any of these steps fail or alignment cannot be satisfied according to the selected policy. Any resources allocated before the failure are cleaned up accordingly.
The following sections go into more detail on the exact structure of **<code>TopologyHints</code>** and **<code>HintProviders</code>**, as well as some details on the merge strategies used by each policy.
### TopologyHints
A **<code>TopologyHint</code>** encodes a set of constraints from which a given resource request can be satisfied. At present, the only constraint we consider is NUMA alignment. It is defined as follows:
```
type TopologyHint struct {
NUMANodeAffinity bitmask.BitMask
Preferred bool
}
```
The **<code>NUMANodeAffinity</code>** field contains a bitmask of NUMA nodes where a resource request can be satisfied. For example, the possible masks on a system with 2 NUMA nodes include:
```
{00}, {01}, {10}, {11}
```
The **<code>Preferred</code>** field contains a boolean that encodes whether the given hint is "preferred" or not. With the **<code>best-effort</code>** policy, preferred hints will be given preference over non-preferred hints when generating a "best" hint. With the **<code>restricted</code>** and **<code>single-numa-node</code>** policies, non-preferred hints will be rejected.
In general, **<code>HintProviders</code>** generate **<code>TopologyHints</code>** by looking at the set of currently available resources that can satisfy a resource request. More specifically, they generate one **<code>TopologyHint</code>** for every possible mask of NUMA nodes where that resource request can be satisfied. If a mask cannot satisfy the request, it is omitted. For example, a **<code>HintProvider</code>** might provide the following hints on a system with 2 NUMA nodes when being asked to allocate 2 resources. These hints encode that both resources could either come from a single NUMA node (either 0 or 1), or they could each come from different NUMA nodes (but we prefer for them to come from just one).
```
{01: True}, {10: True}, {11: False}
```
At present, all **<code>HintProviders</code>** set the **<code>Preferred</code>** field to **<code>True</code>** if and only if the **<code>NUMANodeAffinity</code>** encodes a _minimal_ set of NUMA nodes that can satisfy the resource request. Normally, this will only be **<code>True</code>** for **<code>TopologyHints</code>** with a single NUMA node set in their bitmask. However, it may also be **<code>True</code>** if the only way to _ever_ satisfy the resource request is to span multiple NUMA nodes (e.g. 2 devices are requested and the only 2 devices on the system are on different NUMA nodes):
```
{0011: True}, {0111: False}, {1011: False}, {1111: False}
```
**NOTE:** Setting of the **<code>Preferred</code>** field in this way is _not_ based on the set of currently available resources. It is based on the ability to physically allocate the number of requested resources on some minimal set of NUMA nodes.
In this way, it is possible for a **<code>HintProvider</code>** to return a list of hints with _all_ **<code>Preferred</code>** fields set to **<code>False</code>** if an actual preferred allocation cannot be satisfied until other containers release their resources. For example, consider the following scenario from the system in Figure 1:
1. All but 2 CPUs are currently allocated to containers
1. The 2 remaining CPUs are on different NUMA nodes
1. A new container comes along asking for 2 CPUs
In this case, the only generated hint would be **<code>{11: False}</code>** and not **<code>{11: True}</code>**. This happens because it _is_ possible to allocate 2 CPUs from the same NUMA node on this system (just not right now, given the current allocation state). The idea being that it is better to fail pod admission and retry the deployment when the minimal alignment can be satisfied than to allow a pod to be scheduled with sub-optimal alignment.
### HintProviders
A **<code>HintProvider</code>** is a component internal to the **<code>kubelet</code>** that coordinates aligned resource allocations with the **<code>TopologyManager</code>**. At present, the only **<code>HintProviders</code>** in Kubernetes are the **<code>CPUManager</code>** and the **<code>DeviceManager</code>**. We plan to add support for **<code>HugePages</code>** soon.
As discussed previously, the **<code>TopologyManager</code>** both gathers **<code>TopologyHints</code>** from **<code>HintProviders</code>** as well as triggers aligned resource allocations on them using a merged "best" hint. As such, **<code>HintProviders</code>** implement the following interface:
```
type HintProvider interface {
GetTopologyHints(*v1.Pod, *v1.Container) map[string][]TopologyHint
Allocate(*v1.Pod, *v1.Container) error
}
```
Notice that the call to **<code>GetTopologyHints()</code>** returns a **<code>map[string][]TopologyHint</code>**. This allows a single **<code>HintProvider</code>** to provide hints for multiple resource types instead of just one. For example, the **<code>DeviceManager</code>** requires this in order to pass hints back for every resource type registered by its plugins.
As **<code>HintProviders</code>** generate their hints, they only consider how alignment could be satisfied for _currently_ available resources on the system. Any resources already allocated to other containers are not considered.
For example, consider the system in Figure 1, with the following two containers requesting resources from it:
<table>
<tr>
<td align="center"><strong><code>Container0</code></strong>
</td>
<td align="center"><strong><code>Container1</code></strong>
</td>
</tr>
<tr>
<td>
<pre>
spec:
containers:
- name: numa-aligned-container0
image: alpine
resources:
limits:
cpu: 2
memory: 200Mi
gpu-vendor.com/gpu: 1
nic-vendor.com/nic: 1
</pre>
</td>
<td>
<pre>
spec:
containers:
- name: numa-aligned-container1
image: alpine
resources:
limits:
cpu: 2
memory: 200Mi
gpu-vendor.com/gpu: 1
nic-vendor.com/nic: 1
</pre>
</td>
</tr>
</table>
If **<code>Container0</code>** is the first container considered for allocation on the system, the following set of hints will be generated for the three topology-aware resource types in the spec.
```
cpu: {{01: True}, {10: True}, {11: False}}
gpu-vendor.com/gpu: {{01: True}, {10: True}}
nic-vendor.com/nic: {{01: True}, {10: True}}
```
With a resulting aligned allocation of:
```
{cpu: {0, 1}, gpu: 0, nic: 0}
```
<p align="center">
<img height="300" src="/images/blog/2020-03-25-kubernetes-1.18-release-announcement/numa-hint-provider1.png">
</p>
When considering **<code>Container1</code>** these resources are then presumed to be unavailable, and thus only the following set of hints will be generated:
```
cpu: {{01: True}, {10: True}, {11: False}}
gpu-vendor.com/gpu: {{10: True}}
nic-vendor.com/nic: {{10: True}}
```
With a resulting aligned allocation of:
```
{cpu: {4, 5}, gpu: 1, nic: 1}
```
<p align="center">
<img height="300" src="/images/blog/2020-03-25-kubernetes-1.18-release-announcement/numa-hint-provider2.png">
</p>
**NOTE:** Unlike the pseudocode provided at the beginning of this section, the call to **<code>Allocate()</code>** does not actually take a parameter for the merged "best" hint directly. Instead, the **<code>TopologyManager</code>** implements the following **<code>Store</code>** interface that **<code>HintProviders</code>** can query to retrieve the hint generated for a particular container once it has been generated:
```
type Store interface {
GetAffinity(podUID string, containerName string) TopologyHint
}
```
Separating this out into its own API call allows one to access this hint outside of the pod admission loop. This is useful for debugging as well as for reporting generated hints in tools such as **<code>kubectl</code>**(not yet available).
### Policy.Merge
The merge strategy defined by a given policy dictates how it combines the set of **<code>TopologyHints</code>** generated by all **<code>HintProviders</code>** into a single **<code>TopologyHint</code>** that can be used to inform aligned resource allocations.
The general merge strategy for all supported policies begins the same:
1. Take the cross-product of **<code>TopologyHints</code>** generated for each resource type
1. For each entry in the cross-product, **<code>bitwise-and</code>** the NUMA affinities of each **<code>TopologyHint</code>** together. Set this as the NUMA affinity in a resulting "merged" hint.
1. If all of the hints in an entry have **<code>Preferred</code>** set to **<code>True</code>** , set **<code>Preferred</code>** to **<code>True</code>** in the resulting "merged" hint.
1. If even one of the hints in an entry has **<code>Preferred</code>** set to **<code>False</code>** , set **<code>Preferred</code>** to **<code>False</code>** in the resulting "merged" hint. Also set **<code>Preferred</code>** to **<code>False</code>** in the "merged" hint if its NUMA affinity contains all 0s.
Following the example from the previous section with hints for **<code>Container0</code>** generated as:
```
cpu: {{01: True}, {10: True}, {11: False}}
gpu-vendor.com/gpu: {{01: True}, {10: True}}
nic-vendor.com/nic: {{01: True}, {10: True}}
```
The above algorithm results in the following set of cross-product entries and "merged" hints:
<table>
<tr>
<td align="center">cross-product entry
<p>
<strong><code>{cpu, gpu-vendor.com/gpu, nic-vendor.com/nic}</code></strong>
</p>
</td>
<td align="center">"merged" hint
</td>
</tr>
<tr>
<td align="center">
<strong><code>{{01: True}, {01: True}, {01: True}}</code></strong>
</td>
<td align="center"><strong><code>{01: True}</code></strong>
</td>
</tr>
<tr>
<td align="center">
<strong><code>{{01: True}, {01: True}, {10: True}}</code></strong>
</td>
<td align="center"><strong><code>{00: False}</code></strong>
</td>
</tr>
<tr>
<td align="center">
<strong><code>{{01: True}, {10: True}, {01: True}}</code></strong>
</td>
<td align="center"><strong><code>{00: False}</code></strong>
</td>
</tr>
<tr>
<td align="center">
<strong><code>{{01: True}, {10: True}, {10: True}}</code></strong>
</td>
<td align="center"><strong><code>{00: False}</code></strong>
</td>
</tr>
<tr>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td align="center">
<strong><code>{{10: True}, {01: True}, {01: True}}</code></strong>
</td>
<td align="center"><strong><code>{00: False}</code></strong>
</td>
</tr>
<tr>
<td align="center">
<strong><code>{{10: True}, {01: True}, {10: True}}</code></strong>
</td align="center">
<td align="center"><strong><code>{00: False}</code></strong>
</td>
</tr>
<tr>
<td align="center">
<strong><code>{{10: True}, {10: True}, {01: True}}</code></strong>
</td>
<td align="center"><strong><code>{00: False}</code></strong>
</td>
</tr>
<tr>
<td align="center">
<strong><code>{{10: True}, {10: True}, {10: True}}</code></strong>
</td>
<td align="center"><strong><code>{01: True}</code></strong>
</td>
</tr>
<tr>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td align="center">
<strong><code>{{11: False}, {01: True}, {01: True}}</code></strong>
</td>
<td align="center"><strong><code>{01: False}</code></strong>
</td>
</tr>
<tr>
<td align="center">
<strong><code>{{11: False}, {01: True}, {10: True}}</code></strong>
</td>
<td align="center"><strong><code>{00: False}</code></strong>
</td>
</tr>
<tr>
<td align="center">
<strong><code>{{11: False}, {10: True}, {01: True}}</code></strong>
</td>
<td align="center"><strong><code>{00: False}</code></strong>
</td>
</tr>
<tr>
<td align="center">
<strong><code>{{11: False}, {10: True}, {10: True}}</code></strong>
</td>
<td align="center"><strong><code>{10: False}</code></strong>
</td>
</tr>
</table>
Once this list of "merged" hints has been generated, it is the job of the specific **<code>TopologyManager</code>** policy in use to decide which one to consider as the "best" hint.
In general, this involves:
1. Sorting merged hints by their "narrowness". Narrowness is defined as the number of bits set in a hints NUMA affinity mask. The fewer bits set, the narrower the hint. For hints that have the same number of bits set in their NUMA affinity mask, the hint with the most low order bits set is considered narrower.
1. Sorting merged hints by their **<code>Preferred</code>** field. Hints that have **<code>Preferred</code>** set to **<code>True</code>** are considered more likely candidates than hints with **<code>Preferred</code>** set to **<code>False</code>**.
1. Selecting the narrowest hint with the best possible setting for **<code>Preferred</code>**.
In the case of the **<code>best-effort</code>** policy this algorithm will always result in _some_ hint being selected as the "best" hint and the pod being admitted. This "best" hint is then made available to **<code>HintProviders</code>** so they can make their resource allocations based on it.
However, in the case of the **<code>restricted</code>** and **<code>single-numa-node</code>** policies, any selected hint with **<code>Preferred</code>** set to **<code>False</code>** will be rejected immediately, causing pod admission to fail and no resources to be allocated. Moreover, the **<code>single-numa-node</code>** will also reject a selected hint that has more than one NUMA node set in its affinity mask.
In the example above, the pod would be admitted by all policies with a hint of **<code>{01: True}</code>**.
## Upcoming enhancements
While the 1.18 release and promotion to Beta brings along some great enhancements and fixes, there are still a number of limitations, described [here](https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/#known-limitations). We are already underway working to address these limitations and more.
This section walks through the set of enhancements we plan to implement for the **<code>TopologyManager</code>** in the near future. This list is not exhaustive, but it gives a good idea of the direction we are moving in. It is ordered by the timeframe in which we expect to see each enhancement completed.
If you would like to get involved in helping with any of these enhancements, please [join the weekly Kubernetes SIG-node meetings](https://github.com/kubernetes/community/tree/master/sig-node) to learn more and become part of the community effort!
### Supporting device-specific constraints
Currently, NUMA affinity is the only constraint considered by the **<code>TopologyManager</code>** for resource alignment. Moreover, the only scalable extensions that can be made to a **<code>TopologyHint</code>** involve _node-level_ constraints, such as PCIe bus alignment across device types. It would be intractable to try and add any _device-specific_ constraints to this struct (e.g. the internal NVLINK topology among a set of GPU devices).
As such, we propose an extension to the device plugin interface that will allow a plugin to state its topology-aware allocation preferences, without having to expose any device-specific topology information to the kubelet. In this way, the **<code>TopologyManager</code>** can be restricted to only deal with common node-level topology constraints, while still having a way of incorporating device-specific topology constraints into its allocation decisions.
Details of this proposal can be found [here](https://github.com/kubernetes/enhancements/pull/1121), and should be available as soon as Kubernetes 1.19.
### NUMA alignment for hugepages
As stated previously, the only two **<code>HintProviders</code>** currently available to the **<code>TopologyManager</code>** are the **<code>CPUManager</code>** and the **<code>DeviceManager</code>**. However, work is currently underway to add support for hugepages as well. With the completion of this work, the **<code>TopologyManager</code>** will finally be able to allocate memory, hugepages, CPUs and PCI devices all on the same NUMA node.
A [KEP](https://github.com/kubernetes/enhancements/blob/253f1e5bdd121872d2d0f7020a5ac0365b229e30/keps/sig-node/20200203-memory-manager.md) for this work is currently under review, and a prototype is underway to get this feature implemented very soon.
### Scheduler awareness
Currently, the **<code>TopologyManager</code>** acts as a Pod Admission controller. It is not directly involved in the scheduling decision of where a pod will be placed. Rather, when the kubernetes scheduler (or whatever scheduler is running in the deployment), places a pod on a node to run, the **<code>TopologyManager</code>** will decide if the pod should be "admitted" or "rejected". If the pod is rejected due to lack of available NUMA aligned resources, things can get a little interesting. This kubernetes [issue](https://github.com/kubernetes/kubernetes/issues/84869) highlights and discusses this situation well.
So how do we go about addressing this limitation? We have the [Kubernetes Scheduling Framework](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/20180409-scheduling-framework.md) to the rescue! This framework provides a new set of plugin APIs that integrate with the existing Kubernetes Scheduler and allow scheduling features, such as NUMA alignment, to be implemented without having to resort to other, perhaps less appealing alternatives, including writing your own scheduler, or even worse, creating a fork to add your own scheduler secret sauce.
The details of how to implement these extensions for integration with the **<code>TopologyManager</code>** have not yet been worked out. We still need to answer questions like:
* Will we require duplicated logic to determine device affinity in the **<code>TopologyManager</code>** and the scheduler?
* Do we need a new API to get **<code>TopologyHints</code>** from the **<code>TopologyManager</code>** to the scheduler plugin?
Work on this feature should begin in the next couple of months, so stay tuned!
### Per-pod alignment policy
As stated previously, a single policy is applied to _all_ pods on a node via a global **<code>kubelet</code>** flag, rather than allowing users to select different policies on a pod-by-pod basis (or a container-by-container basis).
While we agree that this would be a great feature to have, there are quite a few hurdles that need to be overcome before it is achievable. The biggest hurdle being that this enhancement will require an API change to be able to express the desired alignment policy in either the Pod spec or its associated **<code>[RuntimeClass](https://kubernetes.io/docs/concepts/containers/runtime-class/)</code>**.
We are only now starting to have serious discussions around this feature, and it is still a few releases away, at the best, from being available.
## Conclusion
With the promotion of the **<code>TopologyManager</code>** to Beta in 1.18, we encourage everyone to give it a try and look forward to any feedback you may have. Many fixes and enhancements have been worked on in the past several releases, greatly improving the functionality and reliability of the **<code>TopologyManager</code>** and its **<code>HintProviders</code>**. While there are still a number of limitations, we have a set of enhancements planned to address them, and look forward to providing you with a number of new features in upcoming releases.
If you have ideas for additional enhancements or a desire for certain features, dont hesitate to let us know. The team is always open to suggestions to enhance and improve the **<code>TopologyManager</code>**.
We hope you have found this blog informative and useful! Let us know if you have any questions or comments. And, happy deploying…..Align Up!
<!-- Docs to Markdown version 1.0β20 -->

View File

@ -0,0 +1,51 @@
---
layout: blog
title: Kubernetes 1.18 Feature Server-side Apply Beta 2
date: 2020-04-01
slug: Kubernetes-1.18-Feature-Server-side-Apply-Beta-2
---
**Authors:** Antoine Pelisse (Google)
## What is Server-side Apply?
Server-side Apply is an important effort to migrate “kubectl apply” to the apiserver. It was started in 2018 by the Apply working group.
The use of kubectl to declaratively apply resources has exposed the following challenges:
- One needs to use the kubectl go code, or they have to shell out to kubectl.
- Strategic merge-patch, the patch format used by kubectl, grew organically and was challenging to fix while maintaining compatibility with various api-server versions.
- Some features are hard to implement directly on the client, for example, unions.
Server-side Apply is a new merging algorithm, as well as tracking of field ownership, running on the Kubernetes api-server. Server-side Apply enables new features like conflict detection, so the system knows when two actors are trying to edit the same field.
## How does it work, whats managedFields?
Server-side Apply works by keeping track of which actor of the system has changed each field of an object. It does so by diffing all updates to objects, and recording all the fields that have changed as well the time of the operation. All this information is stored in the managedFields in the metadata of objects. Since objects can have many fields, this field can be quite large.
When someone applies, we can then use the information stored within managedFields to report relevant conflicts and help the merge algorithm to do the right thing.
## Wasnt it already Beta before 1.18?
Yes, Server-side Apply has been Beta since 1.16, but it didnt track the owner for fields associated with objects that had not been applied. This means that most objects didnt have the managedFields metadata stored, and conflicts for these objects cannot be resolved. With Kubernetes 1.18, all new objects will have the managedFields attached to them and provide accurate information on conflicts.
## How do I use it?
The most common way to use this is through kubectl: `kubectl apply --server-side`. This is likely to show conflicts with other actors, including client-side apply. When that happens, conflicts can be forced by using the `--force-conflicts` flag, which will grab the ownership for the fields that have changed.
## Current limitations
We have two important limitations right now, especially with sub-resources. The first is that if you apply with a status, the status is going to be ignored. We are still going to try and acquire the fields, which may lead to invalid conflicts. The other is that we do not update the managedFields on some sub-resources, including scale, so you may not see information about a horizontal pod autoscaler changing the number of replicas.
## Whats next?
We are working hard to improve the experience of using server-side apply with kubectl, and we are trying to make it the default. As part of that, we want to improve the migration from client-side to server-side.
## Can I help?
Of course! The working-group apply is available on slack #wg-apply, through the [mailing list](https://groups.google.com/forum/#!forum/kubernetes-wg-apply) and we also meet every other Tuesday at 9.30 PT on Zoom. We have lots of exciting features to build and can use all sorts of help.
We would also like to use the opportunity to thank the hard work of all the contributors involved in making this new beta possible:
* Daniel Smith
* Jenny Buckley
* Joe Betz
* Julian Modesto
* Kevin Wiesmüller
* Maria Ntalla

View File

@ -0,0 +1,199 @@
---
layout: blog
title: 'Kong Ingress Controller and Service Mesh: Setting up Ingress to Istio on Kubernetes'
date: 2020-03-18
slug: kong-ingress-controller-and-istio-service-mesh
---
**Author:** Kevin Chen, Kong
Kubernetes has become the de facto way to orchestrate containers and the services within services. But how do we give services outside our cluster access to what is within? Kubernetes comes with the Ingress API object that manages external access to services within a cluster.
Ingress is a group of rules that will proxy inbound connections to endpoints defined by a backend. However, Kubernetes does not know what to do with Ingress resources without an Ingress controller, which is where an open source controller can come into play. In this post, we are going to use one option for this: the Kong Ingress Controller. The Kong Ingress Controller was open-sourced a year ago and recently reached one million downloads. In the recent 0.7 release, service mesh support was also added. Other features of this release include:
* **Built-In Kubernetes Admission Controller**, which validates Custom Resource Definitions (CRD) as they are created or updated and rejects any invalid configurations.
* **In-memory Mode** - Each pods controller actively configures the Kong container in its pod, which limits the blast radius of failure of a single container of Kong or controller container to that pod only.
* **Native gRPC Routing** - gRPC traffic can now be routed via Kong Ingress Controller natively with support for method-based routing.
![K4K-gRPC](/images/blog/Kong-Ingress-Controller-and-Service-Mesh/KIC-gRPC.png)
If you would like a deeper dive into Kong Ingress Controller 0.7, please check out the [GitHub repository](https://github.com/Kong/kubernetes-ingress-controller).
But lets get back to the service mesh support since that will be the main focal point of this blog post. Service mesh allows organizations to address microservices challenges related to security, reliability, and observability by abstracting inter-service communication into a mesh layer. But what if our mesh layer sits within Kubernetes and we still need to expose certain services beyond our cluster? Then you need an Ingress controller such as the Kong Ingress Controller. In this blog post, well cover how to deploy Kong Ingress Controller as your Ingress layer to an Istio mesh. Lets dive right in:
![Kong Kubernetes Ingress Controller](/images/blog/Kong-Ingress-Controller-and-Service-Mesh/k4k8s.png)
### Part 0: Set up Istio on Kubernetes
This blog will assume you have Istio set up on Kubernetes. If you need to catch up to this point, please check out the [Istio documentation](https://istio.io/docs/setup/). It will walk you through setting up Istio on Kubernetes.
### 1. Install the Bookinfo Application
First, we need to label the namespaces that will host our application and Kong proxy. To label our default namespace where the bookinfo app sits, run this command:
```
$ kubectl label namespace default istio-injection=enabled
namespace/default labeled
```
Then create a new namespace that will be hosting our Kong gateway and the Ingress controller:
```
$ kubectl create namespace kong
namespace/kong created
```
Because Kong will be sitting outside the default namespace, be sure you also label the Kong namespace with istio-injection enabled as well:
```
$ kubectl label namespace kong istio-injection=enabled
namespace/kong labeled
```
Having both namespaces labeled `istio-injection=enabled` is necessary. Or else the default configuration will not inject a sidecar container into the pods of your namespaces.
Now deploy your BookInfo application with the following command:
```
$ kubectl apply -f http://bit.ly/bookinfoapp
service/details created
serviceaccount/bookinfo-details created
deployment.apps/details-v1 created
service/ratings created
serviceaccount/bookinfo-ratings created
deployment.apps/ratings-v1 created
service/reviews created
serviceaccount/bookinfo-reviews created
deployment.apps/reviews-v1 created
deployment.apps/reviews-v2 created
deployment.apps/reviews-v3 created
service/productpage created
serviceaccount/bookinfo-productpage created
deployment.apps/productpage-v1 created
```
Lets double-check our Services and Pods to make sure that we have it all set up correctly:
```
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
details ClusterIP 10.97.125.254 <none> 9080/TCP 29s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 29h
productpage ClusterIP 10.97.62.68 <none> 9080/TCP 28s
ratings ClusterIP 10.96.15.180 <none> 9080/TCP 28s
reviews ClusterIP 10.104.207.136 <none> 9080/TCP 28s
```
You should see four new services: details, productpage, ratings, and reviews. None of them have an external IP so we will use the [Kong gateway](https://github.com/Kong/kong) to expose the necessary services. And to check pods, run the following command:
```
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
details-v1-c5b5f496d-9wm29 2/2 Running 0 101s
productpage-v1-7d6cfb7dfd-5mc96 2/2 Running 0 100s
ratings-v1-f745cf57b-hmkwf 2/2 Running 0 101s
reviews-v1-85c474d9b8-kqcpt 2/2 Running 0 101s
reviews-v2-ccffdd984-9jnsj 2/2 Running 0 101s
reviews-v3-98dc67b68-nzw97 2/2 Running 0 101s
```
This command outputs useful data, so lets take a second to understand it. If you examine the READY column, each pod has two containers running: the service and an Envoy sidecar injected alongside it. Another thing to highlight is that there are three review pods but only 1 review service. The Envoy sidecar will load balance the traffic to three different review pods that contain different versions, giving us the ability to A/B test our changes. With that said, you should now be able to access your product page!
```
$ kubectl exec -it $(kubectl get pod -l app=ratings -o jsonpath='{.items[0].metadata.name}') -c ratings -- curl productpage:9080/productpage | grep -o "<title>.*</title>"
<title>Simple Bookstore App</title>
```
### 2. Kong Kubernetes Ingress Controller Without Database
To expose your services to the world, we will deploy Kong as the north-south traffic gateway. [Kong 1.1](https://github.com/Kong/kong/releases/tag/1.1.2) released with declarative configuration and DB-less mode. Declarative configuration allows you to specify the desired system state through a YAML or JSON file instead of a sequence of API calls. Using declarative config provides several key benefits to reduce complexity, increase automation and enhance system performance. And with the Kong Ingress Controller, any Ingress rules you apply to the cluster will automatically be configured on the Kong proxy. Lets set up the Kong Ingress Controller and the actual Kong proxy first like this:
```
$ kubectl apply -f https://bit.ly/k4k8s
namespace/kong configured
customresourcedefinition.apiextensions.k8s.io/kongconsumers.configuration.konghq.com created
customresourcedefinition.apiextensions.k8s.io/kongcredentials.configuration.konghq.com created
customresourcedefinition.apiextensions.k8s.io/kongingresses.configuration.konghq.com created
customresourcedefinition.apiextensions.k8s.io/kongplugins.configuration.konghq.com created
serviceaccount/kong-serviceaccount created
clusterrole.rbac.authorization.k8s.io/kong-ingress-clusterrole created
clusterrolebinding.rbac.authorization.k8s.io/kong-ingress-clusterrole-nisa-binding created
configmap/kong-server-blocks created
service/kong-proxy created
service/kong-validation-webhook created
deployment.apps/ingress-kong created
```
To check if the Kong pod is up and running, run:
```
$ kubectl get pods -n kong
NAME READY STATUS RESTARTS AGE
pod/ingress-kong-8b44c9856-9s42v 3/3 Running 0 2m26s
```
There will be three containers within this pod. The first container is the Kong Gateway that will be the Ingress point to your cluster. The second container is the Ingress controller. It uses Ingress resources and updates the proxy to follow rules defined in the resource. And lastly, the third container is the Envoy proxy injected by Istio. Kong will route traffic through the Envoy sidecar proxy to the appropriate service. To send requests into the cluster via our newly deployed Kong Gateway, setup an environment variable with the a URL based on the IP address at which Kong is accessible.
```
$ export PROXY_URL="$(minikube service -n kong kong-proxy --url | head -1)"
$ echo $PROXY_URL
http://192.168.99.100:32728
```
Next, we need to change some configuration so that the side-car Envoy process can route the request correctly based on the host/authority header of the request. Run the following to stop the route from preserving host:
```
$ echo "
apiVersion: configuration.konghq.com/v1
kind: KongIngress
metadata:
name: do-not-preserve-host
route:
preserve_host: false
" | kubectl apply -f -
kongingress.configuration.konghq.com/do-not-preserve-host created
```
And annotate the existing productpage service to set service-upstream as true:
```
$ kubectl annotate svc productpage Ingress.kubernetes.io/service-upstream="true"
service/productpage annotated
```
Now that we have everything set up, we can look at how to use the Ingress resource to help route external traffic to the services within your Istio mesh. Well create an Ingress rule that routes all traffic with the path of `/` to our productpage service:
```
$ echo "
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: productpage
annotations:
configuration.konghq.com: do-not-preserve-host
spec:
rules:
- http:
paths:
- path: /
backend:
serviceName: productpage
servicePort: 9080
" | kubectl apply -f -
ingress.extensions/productpage created
```
And just like that, the Kong Ingress Controller is able to understand the rules you defined in the Ingress resource and routes it to the productpage service! To view the product page services GUI, go to `$PROXY_URL/productpage` in your browser. Or to test it in your command line, try:
```
$ curl $PROXY_URL/productpage
```
That is all I have for this walk-through. If you enjoyed the technologies used in this post, please check out their repositories since they are all open source and would love to have more contributors! Here are their links for your convenience:
* Kong: [[GitHub](https://github.com/Kong/kubernetes-ingress-controller)] [[Twitter](https://twitter.com/thekonginc)]
* Kubernetes: [[GitHub](https://github.com/kubernetes/kubernetes)] [[Twitter](https://twitter.com/kubernetesio)]
* Istio: [[GitHub](https://github.com/istio/istio)] [[Twitter](https://twitter.com/IstioMesh)]
* Envoy: [[GitHub](https://github.com/envoyproxy/envoy)] [[Twitter](https://twitter.com/EnvoyProxy)]
Thank you for following along!

View File

@ -6,8 +6,8 @@ cid: community
<div class="newcommunitywrapper">
<div class="banner1">
<img src="/images/community/kubernetes-community-final-02.jpg" alt="Kubernetes Conference Gallery" style="width:100%" class="desktop">
<img src="/images/community/kubernetes-community-02-mobile.jpg" alt="Kubernetes Conference Gallery" style="width:100%" class="mobile">
<img src="/images/community/kubernetes-community-final-02.jpg" alt="Kubernetes Conference Gallery" style="width:100%;padding-left:0px" class="desktop">
<img src="/images/community/kubernetes-community-02-mobile.jpg" alt="Kubernetes Conference Gallery" style="width:100%;padding-left:0px" class="mobile">
</div>
<div class="intro">

View File

@ -24,9 +24,9 @@ Once you've set your desired state, the *Kubernetes Control Plane* makes the clu
* **[kubelet](/docs/admin/kubelet/)**, which communicates with the Kubernetes Master.
* **[kube-proxy](/docs/admin/kube-proxy/)**, a network proxy which reflects Kubernetes networking services on each node.
## Kubernetes Objects
## Kubernetes objects
Kubernetes contains a number of abstractions that represent the state of your system: deployed containerized applications and workloads, their associated network and disk resources, and other information about what your cluster is doing. These abstractions are represented by objects in the Kubernetes API. See [Understanding Kubernetes Objects](/docs/concepts/overview/working-with-objects/kubernetes-objects/) for more details.
Kubernetes contains a number of abstractions that represent the state of your system: deployed containerized applications and workloads, their associated network and disk resources, and other information about what your cluster is doing. These abstractions are represented by objects in the Kubernetes API. See [Understanding Kubernetes objects](/docs/concepts/overview/working-with-objects/kubernetes-objects/#kubernetes-objects) for more details.
The basic Kubernetes objects include:
@ -35,7 +35,7 @@ The basic Kubernetes objects include:
* [Volume](/docs/concepts/storage/volumes/)
* [Namespace](/docs/concepts/overview/working-with-objects/namespaces/)
Kubernetes also contains higher-level abstractions that rely on [Controllers](/docs/concepts/architecture/controller/) to build upon the basic objects, and provide additional functionality and convenience features. These include:
Kubernetes also contains higher-level abstractions that rely on [controllers](/docs/concepts/architecture/controller/) to build upon the basic objects, and provide additional functionality and convenience features. These include:
* [Deployment](/docs/concepts/workloads/controllers/deployment/)
* [DaemonSet](/docs/concepts/workloads/controllers/daemonset/)

View File

@ -26,7 +26,7 @@ closer to the desired state, by turning equipment on or off.
## Controller pattern
A controller tracks at least one Kubernetes resource type.
These [objects](/docs/concepts/overview/working-with-objects/kubernetes-objects/)
These [objects](/docs/concepts/overview/working-with-objects/kubernetes-objects/#kubernetes-objects)
have a spec field that represents the desired state. The
controller(s) for that resource are responsible for making the current
state come closer to that desired state.
@ -113,17 +113,15 @@ useful changes, it doesn't matter if the overall state is or is not stable.
As a tenet of its design, Kubernetes uses lots of controllers that each manage
a particular aspect of cluster state. Most commonly, a particular control loop
(controller) uses one kind of resource as its desired state, and has a different
kind of resource that it manages to make that desired state happen.
kind of resource that it manages to make that desired state happen. For example,
a controller for Jobs tracks Job objects (to discover new work) and Pod objects
(to run the Jobs, and then to see when the work is finished). In this case
something else creates the Jobs, whereas the Job controller creates Pods.
It's useful to have simple controllers rather than one, monolithic set of control
loops that are interlinked. Controllers can fail, so Kubernetes is designed to
allow for that.
For example: a controller for Jobs tracks Job objects (to discover
new work) and Pod object (to run the Jobs, and then to see when the work is
finished). In this case something else creates the Jobs, whereas the Job
controller creates Pods.
{{< note >}}
There can be several controllers that create or update the same kind of object.
Behind the scenes, Kubernetes controllers make sure that they only pay attention

View File

@ -30,7 +30,7 @@ A node's status contains the following information:
* [Capacity and Allocatable](#capacity)
* [Info](#info)
Node status and other details about a node can be displayed using below command:
Node status and other details about a node can be displayed using the following command:
```shell
kubectl describe node <insert-node-name-here>
```
@ -72,7 +72,7 @@ The node condition is represented as a JSON object. For example, the following r
]
```
If the Status of the Ready condition remains `Unknown` or `False` for longer than the `pod-eviction-timeout`, an argument is passed to the [kube-controller-manager](/docs/admin/kube-controller-manager/) and all the Pods on the node are scheduled for deletion by the Node Controller. The default eviction timeout duration is **five minutes**. In some cases when the node is unreachable, the apiserver is unable to communicate with the kubelet on the node. The decision to delete the pods cannot be communicated to the kubelet until communication with the apiserver is re-established. In the meantime, the pods that are scheduled for deletion may continue to run on the partitioned node.
If the Status of the Ready condition remains `Unknown` or `False` for longer than the `pod-eviction-timeout` (an argument passed to the [kube-controller-manager](/docs/admin/kube-controller-manager/)), all the Pods on the node are scheduled for deletion by the Node Controller. The default eviction timeout duration is **five minutes**. In some cases when the node is unreachable, the apiserver is unable to communicate with the kubelet on the node. The decision to delete the pods cannot be communicated to the kubelet until communication with the apiserver is re-established. In the meantime, the pods that are scheduled for deletion may continue to run on the partitioned node.
In versions of Kubernetes prior to 1.5, the node controller would [force delete](/docs/concepts/workloads/pods/pod/#force-deletion-of-pods)
these unreachable pods from the apiserver. However, in 1.5 and higher, the node controller does not force delete pods until it is
@ -83,8 +83,8 @@ Kubernetes causes all the Pod objects running on the node to be deleted from the
The node lifecycle controller automatically creates
[taints](/docs/concepts/configuration/taint-and-toleration/) that represent conditions.
When the scheduler is assigning a Pod to a Node, the scheduler takes the Node's taints
into account, except for any taints that the Pod tolerates.
The scheduler takes the Node's taints into consideration when assigning a Pod to a Node.
Pods can also have tolerations which let them tolerate a Node's taints.
### Capacity and Allocatable {#capacity}
@ -131,6 +131,8 @@ Kubernetes creates a node object internally (the representation), and
validates the node by health checking based on the `metadata.name` field. If the node is valid -- that is, if all necessary
services are running -- it is eligible to run a pod. Otherwise, it is
ignored for any cluster activity until it becomes valid.
The name of a Node object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
{{< note >}}
Kubernetes keeps the object for the invalid node and keeps checking to see whether it becomes valid.
@ -157,7 +159,7 @@ controller deletes the node from its list of nodes.
The third is monitoring the nodes' health. The node controller is
responsible for updating the NodeReady condition of NodeStatus to
ConditionUnknown when a node becomes unreachable (i.e. the node controller stops
receiving heartbeats for some reason, e.g. due to the node being down), and then later evicting
receiving heartbeats for some reason, for example due to the node being down), and then later evicting
all the pods from the node (using graceful termination) if the node continues
to be unreachable. (The default timeouts are 40s to start reporting
ConditionUnknown and 5m after that to start evicting pods.) The node controller
@ -182,13 +184,13 @@ a Lease object.
timeout for unreachable nodes).
- The kubelet creates and then updates its Lease object every 10 seconds
(the default update interval). Lease updates occur independently from the
`NodeStatus` updates.
`NodeStatus` updates. If the Lease update fails, the kubelet retries with exponential backoff starting at 200 milliseconds and capped at 7 seconds.
#### Reliability
In Kubernetes 1.4, we updated the logic of the node controller to better handle
cases when a large number of nodes have problems with reaching the master
(e.g. because the master has networking problem). Starting with 1.4, the node
(e.g. because the master has networking problems). Starting with 1.4, the node
controller looks at the state of all nodes in the cluster when making a
decision about pod eviction.
@ -212,9 +214,9 @@ there is only one availability zone (the whole cluster).
A key reason for spreading your nodes across availability zones is so that the
workload can be shifted to healthy zones when one entire zone goes down.
Therefore, if all nodes in a zone are unhealthy then node controller evicts at
the normal rate `--node-eviction-rate`. The corner case is when all zones are
completely unhealthy (i.e. there are no healthy nodes in the cluster). In such
Therefore, if all nodes in a zone are unhealthy then the node controller evicts at
the normal rate of `--node-eviction-rate`. The corner case is when all zones are
completely unhealthy (i.e. there are no healthy nodes in the cluster). In such a
case, the node controller assumes that there's some problem with master
connectivity and stops all evictions until some connectivity is restored.
@ -275,6 +277,12 @@ and do not respect the unschedulable attribute on a node. This assumes that daem
the machine even if it is being drained of applications while it prepares for a reboot.
{{< /note >}}
{{< caution >}}
`kubectl cordon` marks a node as 'unschedulable', which has the side effect of the service
controller removing the node from any LoadBalancer node target lists it was previously
eligible for, effectively removing incoming load balancer traffic from the cordoned node(s).
{{< /caution >}}
### Node capacity
The capacity of the node (number of cpus and amount of memory) is part of the node object.

View File

@ -28,7 +28,7 @@ Add-ons in each section are sorted alphabetically - the ordering does not imply
* [Contiv](http://contiv.github.io) provides configurable networking (native L3 using BGP, overlay using vxlan, classic L2, and Cisco-SDN/ACI) for various use cases and a rich policy framework. Contiv project is fully [open sourced](http://github.com/contiv). The [installer](http://github.com/contiv/install) provides both kubeadm and non-kubeadm based installation options.
* [Contrail](http://www.juniper.net/us/en/products-services/sdn/contrail/contrail-networking/), based on [Tungsten Fabric](https://tungsten.io), is an open source, multi-cloud network virtualization and policy management platform. Contrail and Tungsten Fabric are integrated with orchestration systems such as Kubernetes, OpenShift, OpenStack and Mesos, and provide isolation modes for virtual machines, containers/pods and bare metal workloads.
* [Flannel](https://github.com/coreos/flannel/blob/master/Documentation/kubernetes.md) is an overlay network provider that can be used with Kubernetes.
* [Knitter](https://github.com/ZTE/Knitter/) is a network solution supporting multiple networking in Kubernetes.
* [Knitter](https://github.com/ZTE/Knitter/) is a plugin to support multiple network interfaces in a Kubernetes pod.
* [Multus](https://github.com/Intel-Corp/multus-cni) is a Multi plugin for multiple network support in Kubernetes to support all CNI plugins (e.g. Calico, Cilium, Contiv, Flannel), in addition to SRIOV, DPDK, OVS-DPDK and VPP based workloads in Kubernetes.
* [NSX-T](https://docs.vmware.com/en/VMware-NSX-T/2.0/nsxt_20_ncp_kubernetes.pdf) Container Plug-in (NCP) provides integration between VMware NSX-T and container orchestrators such as Kubernetes, as well as integration between NSX-T and container-based CaaS/PaaS platforms such as Pivotal Container Service (PKS) and OpenShift.
* [Nuage](https://github.com/nuagenetworks/nuage-kubernetes/blob/v5.1.1-1/docs/kubernetes-1-installation.rst) is an SDN platform that provides policy-based networking between Kubernetes Pods and non-Kubernetes environments with visibility and security monitoring.
@ -46,7 +46,7 @@ Add-ons in each section are sorted alphabetically - the ordering does not imply
## Infrastructure
* [KubeVirt](https://kubevirt.io/user-guide/docs/latest/administration/intro.html#cluster-side-add-on-deployment) is an add-on to run virtual machines on Kubernetes. Usually run on bare-metal clusters.
* [KubeVirt](https://kubevirt.io/user-guide/#/installation/installation) is an add-on to run virtual machines on Kubernetes. Usually run on bare-metal clusters.
## Legacy Add-ons

View File

@ -130,11 +130,11 @@ Finally, add the same parameters into the API server start parameters.
Note that you may need to adapt the sample commands based on the hardware
architecture and cfssl version you are using.
curl -L https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 -o cfssl
curl -L https://github.com/cloudflare/cfssl/releases/download/v1.4.1/cfssl_1.4.1_linux_amd64 -o cfssl
chmod +x cfssl
curl -L https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 -o cfssljson
curl -L https://github.com/cloudflare/cfssl/releases/download/v1.4.1/cfssljson_1.4.1_linux_amd64 -o cfssljson
chmod +x cfssljson
curl -L https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64 -o cfssl-certinfo
curl -L https://github.com/cloudflare/cfssl/releases/download/v1.4.1/cfssl-certinfo_1.4.1_linux_amd64 -o cfssl-certinfo
chmod +x cfssl-certinfo
1. Create a directory to hold the artifacts and initialize cfssl:

View File

@ -94,7 +94,7 @@ Different settings can be applied to a load balancer service in AWS using _annot
* `service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix`: Used to specify access log s3 bucket prefix.
* `service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags`: Used on the service to specify a comma-separated list of key-value pairs which will be recorded as additional tags in the ELB. For example: `"Key1=Val1,Key2=Val2,KeyNoVal1=,KeyNoVal2"`.
* `service.beta.kubernetes.io/aws-load-balancer-backend-protocol`: Used on the service to specify the protocol spoken by the backend (pod) behind a listener. If `http` (default) or `https`, an HTTPS listener that terminates the connection and parses headers is created. If set to `ssl` or `tcp`, a "raw" SSL listener is used. If set to `http` and `aws-load-balancer-ssl-cert` is not used then a HTTP listener is used.
* `service.beta.kubernetes.io/aws-load-balancer-ssl-cert`: Used on the service to request a secure listener. Value is a valid certificate ARN. For more, see [ELB Listener Config](http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-listener-config.html) CertARN is an IAM or CM certificate ARN, e.g. `arn:aws:acm:us-east-1:123456789012:certificate/12345678-1234-1234-1234-123456789012`.
* `service.beta.kubernetes.io/aws-load-balancer-ssl-cert`: Used on the service to request a secure listener. Value is a valid certificate ARN. For more, see [ELB Listener Config](http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-listener-config.html) CertARN is an IAM or CM certificate ARN, for example `arn:aws:acm:us-east-1:123456789012:certificate/12345678-1234-1234-1234-123456789012`.
* `service.beta.kubernetes.io/aws-load-balancer-connection-draining-enabled`: Used on the service to enable or disable connection draining.
* `service.beta.kubernetes.io/aws-load-balancer-connection-draining-timeout`: Used on the service to specify a connection draining timeout.
* `service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout`: Used on the service to specify the idle connection timeout.

View File

@ -20,7 +20,6 @@ See the guides in [Setup](/docs/setup/) for examples of how to plan, set up, and
Before choosing a guide, here are some considerations:
- Do you just want to try out Kubernetes on your computer, or do you want to build a high-availability, multi-node cluster? Choose distros best suited for your needs.
- **If you are designing for high-availability**, learn about configuring [clusters in multiple zones](/docs/concepts/cluster-administration/federation/).
- Will you be using **a hosted Kubernetes cluster**, such as [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/), or **hosting your own cluster**?
- Will your cluster be **on-premises**, or **in the cloud (IaaS)**? Kubernetes does not directly support hybrid clusters. Instead, you can set up multiple clusters.
- **If you are configuring Kubernetes on-premises**, consider which [networking model](/docs/concepts/cluster-administration/networking/) fits best.
@ -44,7 +43,7 @@ Note: Not all distros are actively maintained. Choose distros which have been te
* [Certificates](/docs/concepts/cluster-administration/certificates/) describes the steps to generate certificates using different tool chains.
* [Kubernetes Container Environment](/docs/concepts/containers/container-environment-variables/) describes the environment for Kubelet managed containers on a Kubernetes node.
* [Kubernetes Container Environment](/docs/concepts/containers/container-environment/) describes the environment for Kubelet managed containers on a Kubernetes node.
* [Controlling Access to the Kubernetes API](/docs/reference/access-authn-authz/controlling-access/) describes how to set up permissions for users and service accounts.

View File

@ -1,50 +0,0 @@
---
title: Controller manager metrics
content_template: templates/concept
weight: 100
---
{{% capture overview %}}
Controller manager metrics provide important insight into the performance and health of
the controller manager.
{{% /capture %}}
{{% capture body %}}
## What are controller manager metrics
Controller manager metrics provide important insight into the performance and health of the controller manager.
These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as
etcd request latencies or Cloudprovider (AWS, GCE, OpenStack) API latencies that can be used
to gauge the health of a cluster.
Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and OpenStack.
These metrics can be used to monitor health of persistent volume operations.
For example, for GCE these metrics are called:
```
cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
```
## Configuration
In a cluster, controller-manager metrics are available from `http://localhost:10252/metrics`
from the host where the controller-manager is running.
The metrics are emitted in [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and are human readable.
In a production environment you may want to configure prometheus or some other metrics scraper
to periodically gather these metrics and make them available in some kind of time series database.
{{% /capture %}}

View File

@ -1,186 +0,0 @@
---
title: Federation
content_template: templates/concept
weight: 80
---
{{% capture overview %}}
{{< deprecationfilewarning >}}
{{< include "federation-deprecation-warning-note.md" >}}
{{< /deprecationfilewarning >}}
This page explains why and how to manage multiple Kubernetes clusters using
federation.
{{% /capture %}}
{{% capture body %}}
## Why federation
Federation makes it easy to manage multiple clusters. It does so by providing 2
major building blocks:
* Sync resources across clusters: Federation provides the ability to keep
resources in multiple clusters in sync. For example, you can ensure that the same deployment exists in multiple clusters.
* Cross cluster discovery: Federation provides the ability to auto-configure DNS servers and load balancers with backends from all clusters. For example, you can ensure that a global VIP or DNS record can be used to access backends from multiple clusters.
Some other use cases that federation enables are:
* High Availability: By spreading load across clusters and auto configuring DNS
servers and load balancers, federation minimises the impact of cluster
failure.
* Avoiding provider lock-in: By making it easier to migrate applications across
clusters, federation prevents cluster provider lock-in.
Federation is not helpful unless you have multiple clusters. Some of the reasons
why you might want multiple clusters are:
* Low latency: Having clusters in multiple regions minimises latency by serving
users from the cluster that is closest to them.
* Fault isolation: It might be better to have multiple small clusters rather
than a single large cluster for fault isolation (for example: multiple
clusters in different availability zones of a cloud provider).
* Scalability: There are scalability limits to a single kubernetes cluster (this
should not be the case for most users. For more details:
[Kubernetes Scaling and Performance Goals](https://git.k8s.io/community/sig-scalability/goals.md)).
* [Hybrid cloud](#hybrid-cloud-capabilities): You can have multiple clusters on different cloud providers or
on-premises data centers.
### Caveats
While there are a lot of attractive use cases for federation, there are also
some caveats:
* Increased network bandwidth and cost: The federation control plane watches all
clusters to ensure that the current state is as expected. This can lead to
significant network cost if the clusters are running in different regions on
a cloud provider or on different cloud providers.
* Reduced cross cluster isolation: A bug in the federation control plane can
impact all clusters. This is mitigated by keeping the logic in federation
control plane to a minimum. It mostly delegates to the control plane in
kubernetes clusters whenever it can. The design and implementation also errs
on the side of safety and avoiding multi-cluster outage.
* Maturity: The federation project is relatively new and is not very mature.
Not all resources are available and many are still alpha. [Issue
88](https://github.com/kubernetes/federation/issues/88) enumerates
known issues with the system that the team is busy solving.
### Hybrid cloud capabilities
Federations of Kubernetes Clusters can include clusters running in
different cloud providers (e.g. Google Cloud, AWS), and on-premises
(e.g. on OpenStack). [Kubefed](/docs/tasks/federation/set-up-cluster-federation-kubefed/) is the recommended way to deploy federated clusters.
Thereafter, your [API resources](#api-resources) can span different clusters
and cloud providers.
## Setting up federation
To be able to federate multiple clusters, you first need to set up a federation
control plane.
Follow the [setup guide](/docs/tutorials/federation/set-up-cluster-federation-kubefed/) to set up the
federation control plane.
## API resources
Once you have the control plane set up, you can start creating federation API
resources.
The following guides explain some of the resources in detail:
* [Cluster](/docs/tasks/federation/administer-federation/cluster/)
* [ConfigMap](/docs/tasks/federation/administer-federation/configmap/)
* [DaemonSets](/docs/tasks/federation/administer-federation/daemonset/)
* [Deployment](/docs/tasks/federation/administer-federation/deployment/)
* [Events](/docs/tasks/federation/administer-federation/events/)
* [Hpa](/docs/tasks/federation/administer-federation/hpa/)
* [Ingress](/docs/tasks/federation/administer-federation/ingress/)
* [Jobs](/docs/tasks/federation/administer-federation/job/)
* [Namespaces](/docs/tasks/federation/administer-federation/namespaces/)
* [ReplicaSets](/docs/tasks/federation/administer-federation/replicaset/)
* [Secrets](/docs/tasks/federation/administer-federation/secret/)
* [Services](/docs/concepts/cluster-administration/federation-service-discovery/)
The [API reference docs](/docs/reference/federation/) list all the
resources supported by federation apiserver.
## Cascading deletion
Kubernetes version 1.6 includes support for cascading deletion of federated
resources. With cascading deletion, when you delete a resource from the
federation control plane, you also delete the corresponding resources in all underlying clusters.
Cascading deletion is not enabled by default when using the REST API. To enable
it, set the option `DeleteOptions.orphanDependents=false` when you delete a
resource from the federation control plane using the REST API. Using `kubectl
delete`
enables cascading deletion by default. You can disable it by running `kubectl
delete --cascade=false`
Note: Kubernetes version 1.5 included cascading deletion support for a subset of
federation resources.
## Scope of a single cluster
On IaaS providers such as Google Compute Engine or Amazon Web Services, a VM exists in a
[zone](https://cloud.google.com/compute/docs/zones) or [availability
zone](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html).
We suggest that all the VMs in a Kubernetes cluster should be in the same availability zone, because:
- compared to having a single global Kubernetes cluster, there are fewer single-points of failure.
- compared to a cluster that spans availability zones, it is easier to reason about the availability properties of a
single-zone cluster.
- when the Kubernetes developers are designing the system (e.g. making assumptions about latency, bandwidth, or
correlated failures) they are assuming all the machines are in a single data center, or otherwise closely connected.
It is recommended to run fewer clusters with more VMs per availability zone; but it is possible to run multiple clusters per availability zones.
Reasons to prefer fewer clusters per availability zone are:
- improved bin packing of Pods in some cases with more nodes in one cluster (less resource fragmentation).
- reduced operational overhead (though the advantage is diminished as ops tooling and processes mature).
- reduced costs for per-cluster fixed resource costs, e.g. apiserver VMs (but small as a percentage
of overall cluster cost for medium to large clusters).
Reasons to have multiple clusters include:
- strict security policies requiring isolation of one class of work from another (but, see Partitioning Clusters
below).
- test clusters to canary new Kubernetes releases or other cluster software.
## Selecting the right number of clusters
The selection of the number of Kubernetes clusters may be a relatively static choice, only revisited occasionally.
By contrast, the number of nodes in a cluster and the number of pods in a service may change frequently according to
load and growth.
To pick the number of clusters, first, decide which regions you need to be in to have adequate latency to all your end users, for services that will run
on Kubernetes (if you use a Content Distribution Network, the latency requirements for the CDN-hosted content need not
be considered). Legal issues might influence this as well. For example, a company with a global customer base might decide to have clusters in US, EU, AP, and SA regions.
Call the number of regions to be in `R`.
Second, decide how many clusters should be able to be unavailable at the same time, while still being available. Call
the number that can be unavailable `U`. If you are not sure, then 1 is a fine choice.
If it is allowable for load-balancing to direct traffic to any region in the event of a cluster failure, then
you need at least the larger of `R` or `U + 1` clusters. If it is not (e.g. you want to ensure low latency for all
users in the event of a cluster failure), then you need to have `R * (U + 1)` clusters
(`U + 1` in each of `R` regions). In any case, try to put each cluster in a different zone.
Finally, if any of your clusters would need more than the maximum recommended number of nodes for a Kubernetes cluster, then
you may need even more clusters. Kubernetes v1.3 supports clusters up to 1000 nodes in size. Kubernetes v1.8 supports
clusters up to 5000 nodes. See [Building Large Clusters](/docs/setup/best-practices/cluster-large/) for more guidance.
{{% /capture %}}
{{% capture whatsnext %}}
* Learn more about the [Federation
proposal](https://github.com/kubernetes/community/blob/{{< param "githubbranch" >}}/contributors/design-proposals/multicluster/federation.md).
* See this [setup guide](/docs/tutorials/federation/set-up-cluster-federation-kubefed/) for cluster federation.
* See this [Kubecon2016 talk on federation](https://www.youtube.com/watch?v=pq9lbkmxpS8)
* See this [Kubecon2017 Europe update on federation](https://www.youtube.com/watch?v=kwOvOLnFYck)
* See this [Kubecon2018 Europe update on sig-multicluster](https://www.youtube.com/watch?v=vGZo5DaThQU)
* See this [Kubecon2018 Europe Federation-v2 prototype presentation](https://youtu.be/q27rbaX5Jis?t=7m20s)
* See this [Federation-v2 Userguide](https://github.com/kubernetes-sigs/federation-v2/blob/master/docs/userguide.md)
{{% /capture %}}

View File

@ -0,0 +1,377 @@
---
title: API Priority and Fairness
content_template: templates/concept
min-kubernetes-server-version: v1.18
---
{{% capture overview %}}
{{< feature-state state="alpha" for_k8s_version="v1.18" >}}
Controlling the behavior of the Kubernetes API server in an overload situation
is a key task for cluster administrators. The {{< glossary_tooltip
term_id="kube-apiserver" text="kube-apiserver" >}} has some controls available
(i.e. the `--max-requests-inflight` and `--max-mutating-requests-inflight`
command-line flags) to limit the amount of outstanding work that will be
accepted, preventing a flood of inbound requests from overloading and
potentially crashing the API server, but these flags are not enough to ensure
that the most important requests get through in a period of high traffic.
The API Priority and Fairness feature (APF) is an alternative that improves upon
aforementioned max-inflight limitations. APF classifies
and isolates requests in a more fine-grained way. It also introduces
a limited amount of queuing, so that no requests are rejected in cases
of very brief bursts. Requests are dispatched from queues using a
fair queuing technique so that, for example, a poorly-behaved {{<
glossary_tooltip text="controller" term_id="controller" >}}) need not
starve others (even at the same priority level).
{{< caution >}}
Requests classified as "long-running" — primarily watches — are not
subject to the API Priority and Fairness filter. This is also true for
the `--max-requests-inflight` flag without the API Priority and
Fairness feature enabled.
{{< /caution >}}
{{% /capture %}}
{{% capture body %}}
## Enabling API Priority and Fairness
The API Priority and Fairness feature is controlled by a feature gate
and is not enabled by default. See
[Feature Gates](/docs/reference/command-line-tools-reference/feature-gates/)
for a general explanation of feature gates and how to enable and disable them. The
name of the feature gate for APF is "APIPriorityAndFairness". This
feature also involves an {{< glossary_tooltip term_id="api-group"
text="API Group" >}} that must be enabled. You can do these
things by adding the following command-line flags to your
`kube-apiserver` invocation:
```shell
kube-apiserver \
--feature-gates=APIPriorityAndFairness=true \
--runtime-config=flowcontrol.apiserver.k8s.io/v1alpha1=true \
# …and other flags as usual
```
The command-line flag `--enable-priority-and-fairness=false` will disable the
API Priority and Fairness feature, even if other flags have enabled it.
## Concepts
There are several distinct features involved in the API Priority and Fairness
feature. Incoming requests are classified by attributes of the request using
_FlowSchemas_, and assigned to priority levels. Priority levels add a degree of
isolation by maintaining separate concurrency limits, so that requests assigned
to different priority levels cannot starve each other. Within a priority level,
a fair-queuing algorithm prevents requests from different _flows_ from starving
each other, and allows for requests to be queued to prevent bursty traffic from
causing failed requests when the average load is acceptably low.
### Priority Levels
Without APF enabled, overall concurrency in
the API server is limited by the `kube-apiserver` flags
`--max-requests-inflight` and `--max-mutating-requests-inflight`. With APF
enabled, the concurrency limits defined by these flags are summed and then the sum is divided up
among a configurable set of _priority levels_. Each incoming request is assigned
to a single priority level, and each priority level will only dispatch as many
concurrent requests as its configuration allows.
The default configuration, for example, includes separate priority levels for
leader-election requests, requests from built-in controllers, and requests from
Pods. This means that an ill-behaved Pod that floods the API server with
requests cannot prevent leader election or actions by the built-in controllers
from succeeding.
### Queuing
Even within a priority level there may be a large number of distinct sources of
traffic. In an overload situation, it is valuable to prevent one stream of
requests from starving others (in particular, in the relatively common case of a
single buggy client flooding the kube-apiserver with requests, that buggy client
would ideally not have much measurable impact on other clients at all). This is
handled by use of a fair-queuing algorithm to process requests that are assigned
the same priority level. Each request is assigned to a _flow_, identified by the
name of the matching FlowSchema plus a _flow distinguisher_ — which
is either the requesting user, the target resource's namespace, or nothing — and the
system attempts to give approximately equal weight to requests in different
flows of the same priority level.
After classifying a request into a flow, the API Priority and Fairness
feature then may assign the request to a queue. This assignment uses
a technique known as {{< glossary_tooltip term_id="shuffle-sharding"
text="shuffle sharding" >}}, which makes relatively efficient use of
queues to insulate low-intensity flows from high-intensity flows.
The details of the queuing algorithm are tunable for each priority level, and
allow administrators to trade off memory use, fairness (the property that
independent flows will all make progress when total traffic exceeds capacity),
tolerance for bursty traffic, and the added latency induced by queuing.
### Exempt requests
Some requests are considered sufficiently important that they are not subject to
any of the limitations imposed by this feature. These exemptions prevent an
improperly-configured flow control configuration from totally disabling an API
server.
## Defaults
The Priority and Fairness feature ships with a suggested configuration that
should suffice for experimentation; if your cluster is likely to
experience heavy load then you should consider what configuration will work best. The suggested configuration groups requests into five priority
classes:
* The `system` priority level is for requests from the `system:nodes` group,
i.e. Kubelets, which must be able to contact the API server in order for
workloads to be able to schedule on them.
* The `leader-election` priority level is for leader election requests from
built-in controllers (in particular, requests for `endpoints`, `configmaps`,
or `leases` coming from the `system:kube-controller-manager` or
`system:kube-scheduler` users and service accounts in the `kube-system`
namespace). These are important to isolate from other traffic because failures
in leader election cause their controllers to fail and restart, which in turn
causes more expensive traffic as the new controllers sync their informers.
* The `workload-high` priority level is for other requests from built-in
controllers.
* The `workload-low` priority level is for requests from any other service
account, which will typically include all requests from controllers runing in
Pods.
* The `global-default` priority level handles all other traffic, e.g.
interactive `kubectl` commands run by nonprivileged users.
Additionally, there are two PriorityLevelConfigurations and two FlowSchemas that
are built in and may not be overwritten:
* The special `exempt` priority level is used for requests that are not subject
to flow control at all: they will always be dispatched immediately. The
special `exempt` FlowSchema classifies all requests from the `system:masters`
group into this priority level. You may define other FlowSchemas that direct
other requests to this priority level, if appropriate.
* The special `catch-all` priority level is used in combination with the special
`catch-all` FlowSchema to make sure that every request gets some kind of
classification. Typically you should not rely on this catch-all configuration,
and should create your own catch-all FlowSchema and PriorityLevelConfiguration
(or use the `global-default` configuration that is installed by default) as
appropriate. To help catch configuration errors that miss classifying some
requests, the mandatory `catch-all` priority level only allows one concurrency
share and does not queue requests, making it relatively likely that traffic
that only matches the `catch-all` FlowSchema will be rejected with an HTTP 429
error.
## Resources
The flow control API involves two kinds of resources.
[PriorityLevelConfigurations](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#prioritylevelconfiguration-v1alpha1-flowcontrol)
define the available isolation classes, the share of the available concurrency
budget that each can handle, and allow for fine-tuning queuing behavior.
[FlowSchemas](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#flowschema-v1alpha1-flowcontrol)
are used to classify individual inbound requests, matching each to a single
PriorityLevelConfiguration.
### PriorityLevelConfiguration
A PriorityLevelConfiguration represents a single isolation class. Each
PriorityLevelConfiguration has an independent limit on the number of outstanding
requests, and limitations on the number of queued requests.
Concurrency limits for PriorityLevelConfigurations are not specified in absolute
number of requests, but rather in "concurrency shares." The total concurrency
limit for the API Server is distributed among the existing
PriorityLevelConfigurations in proportion with these shares. This allows a
cluster administrator to scale up or down the total amount of traffic to a
server by restarting `kube-apiserver` with a different value for
`--max-requests-inflight` (or `--max-mutating-requests-inflight`), and all
PriorityLevelConfigurations will see their maximum allowed concurrency go up (or
down) by the same fraction.
{{< caution >}}
With the Priority and Fairness feature enabled, the total concurrency limit for
the server is set to the sum of `--max-requests-inflight` and
`--max-mutating-requests-inflight`. There is no longer any distinction made
between mutating and non-mutating requests; if you want to treat them
separately for a given resource, make separate FlowSchemas that match the
mutating and non-mutating verbs respectively.
{{< /caution >}}
When the volume of inbound requests assigned to a single
PriorityLevelConfiguration is more than its permitted concurrency level, the
`type` field of its specification determines what will happen to extra requests.
A type of `Reject` means that excess traffic will immediately be rejected with
an HTTP 429 (Too Many Requests) error. A type of `Queue` means that requests
above the threshold will be queued, with the shuffle sharding and fair queuing techniques used
to balance progress between request flows.
The queuing configuration allows tuning the fair queuing algorithm for a
priority level. Details of the algorithm can be read in the [enhancement
proposal](#what-s-next), but in short:
* Increasing `queues` reduces the rate of collisions between different flows, at
the cost of increased memory usage. A value of 1 here effectively disables the
fair-queuing logic, but still allows requests to be queued.
* Increasing `queueLengthLimit` allows larger bursts of traffic to be
sustained without dropping any requests, at the cost of increased
latency and memory usage.
* Changing `handSize` allows you to adjust the probability of collisions between
different flows and the overall concurrency available to a single flow in an
overload situation.
{{< note >}}
A larger `handSize` makes it less likely for two individual flows to collide
(and therefore for one to be able to starve the other), but more likely that
a small number of flows can dominate the apiserver. A larger `handSize` also
potentially increases the amount of latency that a single high-traffic flow
can cause. The maximum number of queued requests possible from a
single flow is `handSize * queueLengthLimit`.
{{< /note >}}
Following is a table showing an interesting collection of shuffle
sharding configurations, showing for each the probability that a
given mouse (low-intensity flow) is squished by the elephants (high-intensity flows) for
an illustrative collection of numbers of elephants. See
https://play.golang.org/p/Gi0PLgVHiUg , which computes this table.
{{< table caption="Example Shuffle Sharding Configurations" >}}
|HandSize| Queues| 1 elephant| 4 elephants| 16 elephants|
|--------|-----------|------------|----------------|--------------------|
| 12| 32| 4.428838398950118e-09| 0.11431348830099144| 0.9935089607656024|
| 10| 32| 1.550093439632541e-08| 0.0626479840223545| 0.9753101519027554|
| 10| 64| 6.601827268370426e-12| 0.00045571320990370776| 0.49999929150089345|
| 9| 64| 3.6310049976037345e-11| 0.00045501212304112273| 0.4282314876454858|
| 8| 64| 2.25929199850899e-10| 0.0004886697053040446| 0.35935114681123076|
| 8| 128| 6.994461389026097e-13| 3.4055790161620863e-06| 0.02746173137155063|
| 7| 128| 1.0579122850901972e-11| 6.960839379258192e-06| 0.02406157386340147|
| 7| 256| 7.597695465552631e-14| 6.728547142019406e-08| 0.0006709661542533682|
| 6| 256| 2.7134626662687968e-12| 2.9516464018476436e-07| 0.0008895654642000348|
| 6| 512| 4.116062922897309e-14| 4.982983350480894e-09| 2.26025764343413e-05|
| 6| 1024| 6.337324016514285e-16| 8.09060164312957e-11| 4.517408062903668e-07|
### FlowSchema
A FlowSchema matches some inbound requests and assigns them to a
priority level. Every inbound request is tested against every
FlowSchema in turn, starting with those with numerically lowest ---
which we take to be the logically highest --- `matchingPrecedence` and
working onward. The first match wins.
{{< caution >}}
Only the first matching FlowSchema for a given request matters. If multiple
FlowSchemas match a single inbound request, it will be assigned based on the one
with the highest `matchingPrecedence`. If multiple FlowSchemas with equal
`matchingPrecedence` match the same request, the one with lexicographically
smaller `name` will win, but it's better not to rely on this, and instead to
ensure that no two FlowSchemas have the same `matchingPrecedence`.
{{< /caution >}}
A FlowSchema matches a given request if at least one of its `rules`
matches. A rule matches if at least one of its `subjects` *and* at least
one of its `resourceRules` or `nonResourceRules` (depending on whether the
incoming request is for a resource or non-resource URL) matches the request.
For the `name` field in subjects, and the `verbs`, `apiGroups`, `resources`,
`namespaces`, and `nonResourceURLs` fields of resource and non-resource rules,
the wildcard `*` may be specified to match all values for the given field,
effectively removing it from consideration.
A FlowSchema's `distinguisherMethod.type` determines how requests matching that
schema will be separated into flows. It may be
either `ByUser`, in which case one requesting user will not be able to starve
other users of capacity, or `ByNamespace`, in which case requests for resources
in one namespace will not be able to starve requests for resources in other
namespaces of capacity, or it may be blank (or `distinguisherMethod` may be
omitted entirely), in which case all requests matched by this FlowSchema will be
considered part of a single flow. The correct choice for a given FlowSchema
depends on the resource and your particular environment.
## Diagnostics
Every HTTP response from an API server with the priority and fairness feature
enabled has two extra headers: `X-Kubernetes-PF-FlowSchema-UID` and
`X-Kubernetes-PF-PriorityLevel-UID`, noting the flow schema that matched the request
and the priority level to which it was assigned, respectively. The API objects'
names are not included in these headers in case the requesting user does not
have permission to view them, so when debugging you can use a command like
```shell
kubectl get flowschemas -o custom-columns="uid:{metadata.uid},name:{metadata.name}"
kubectl get prioritylevelconfigurations -o custom-columns="uid:{metadata.uid},name:{metadata.name}"
```
to get a mapping of UIDs to names for both FlowSchemas and
PriorityLevelConfigurations.
## Observability
When you enable the API Priority and Fairness feature, the kube-apiserver
exports additional metrics. Monitoring these can help you determine whether your
configuration is inappropriately throttling important traffic, or find
poorly-behaved workloads that may be harming system health.
* `apiserver_flowcontrol_rejected_requests_total` counts requests that
were rejected, grouped by the name of the assigned priority level,
the name of the assigned FlowSchema, and the reason for rejection.
The reason will be one of the following:
* `queue-full`, indicating that too many requests were already
queued,
* `concurrency-limit`, indicating that the
PriorityLevelConfiguration is configured to reject rather than
queue excess requests, or
* `time-out`, indicating that the request was still in the queue
when its queuing time limit expired.
* `apiserver_flowcontrol_dispatched_requests_total` counts requests
that began executing, grouped by the name of the assigned priority
level and the name of the assigned FlowSchema.
* `apiserver_flowcontrol_current_inqueue_requests` gives the
instantaneous total number of queued (not executing) requests,
grouped by priority level and FlowSchema.
* `apiserver_flowcontrol_current_executing_requests` gives the instantaneous
total number of executing requests, grouped by priority level and FlowSchema.
* `apiserver_flowcontrol_request_queue_length_after_enqueue` gives a
histogram of queue lengths for the queues, grouped by priority level
and FlowSchema, as sampled by the enqueued requests. Each request
that gets queued contributes one sample to its histogram, reporting
the length of the queue just after the request was added. Note that
this produces different statistics than an unbiased survey would.
{{< note >}}
An outlier value in a histogram here means it is likely that a single flow
(i.e., requests by one user or for one namespace, depending on
configuration) is flooding the API server, and being throttled. By contrast,
if one priority level's histogram shows that all queues for that priority
level are longer than those for other priority levels, it may be appropriate
to increase that PriorityLevelConfiguration's concurrency shares.
{{< /note >}}
* `apiserver_flowcontrol_request_concurrency_limit` gives the computed
concurrency limit (based on the API server's total concurrency limit and PriorityLevelConfigurations'
concurrency shares) for each PriorityLevelConfiguration.
* `apiserver_flowcontrol_request_wait_duration_seconds` gives a histogram of how
long requests spent queued, grouped by the FlowSchema that matched the
request, the PriorityLevel to which it was assigned, and whether or not the
request successfully executed.
{{< note >}}
Since each FlowSchema always assigns requests to a single
PriorityLevelConfiguration, you can add the histograms for all the
FlowSchemas for one priority level to get the effective histogram for
requests assigned to that priority level.
{{< /note >}}
* `apiserver_flowcontrol_request_execution_seconds` gives a histogram of how
long requests took to actually execute, grouped by the FlowSchema that matched the
request and the PriorityLevel to which it was assigned.
{{% /capture %}}
{{% capture whatsnext %}}
For background information on design details for API priority and fairness, see
the [enhancement proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md).
You can make suggestions and feature requests via [SIG API
Machinery](https://github.com/kubernetes/community/tree/master/sig-api-machinery).
{{% /capture %}}

View File

@ -76,7 +76,7 @@ should set up a solution to address that.
For example, in Kubernetes clusters, deployed by the `kube-up.sh` script,
there is a [`logrotate`](https://linux.die.net/man/8/logrotate)
tool configured to run each hour. You can also set up a container runtime to
rotate application's logs automatically, e.g. by using Docker's `log-opt`.
rotate application's logs automatically, for example by using Docker's `log-opt`.
In the `kube-up.sh` script, the latter approach is used for COS image on GCP,
and the former approach is used in any other environment. In both cases, by
default rotation is configured to take place when log file exceeds 10MB.

View File

@ -424,16 +424,16 @@ At some point, you'll eventually need to update your deployed application, typic
We'll guide you through how to create and update applications with Deployments.
Let's say you were running version 1.7.9 of nginx:
Let's say you were running version 1.14.2 of nginx:
```shell
kubectl run my-nginx --image=nginx:1.7.9 --replicas=3
kubectl run my-nginx --image=nginx:1.14.2 --replicas=3
```
```shell
deployment.apps/my-nginx created
```
To update to version 1.9.1, simply change `.spec.template.spec.containers[0].image` from `nginx:1.7.9` to `nginx:1.9.1`, with the kubectl commands we learned above.
To update to version 1.16.1, simply change `.spec.template.spec.containers[0].image` from `nginx:1.14.2` to `nginx:1.16.1`, with the kubectl commands we learned above.
```shell
kubectl edit deployment/my-nginx

View File

@ -0,0 +1,132 @@
---
title: Metrics For The Kubernetes Control Plane
reviewers:
- brancz
- logicalhan
- RainbowMango
content_template: templates/concept
weight: 60
aliases:
- controller-metrics.md
---
{{% capture overview %}}
System component metrics can give a better look into what is happening inside them. Metrics are particularly useful for building dashboards and alerts.
Metrics in Kubernetes control plane are emitted in [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and are human readable.
{{% /capture %}}
{{% capture body %}}
## Metrics in Kubernetes
In most cases metrics are available on `/metrics` endpoint of the HTTP server. For components that doesn't expose endpoint by default it can be enabled using `--bind-address` flag.
Examples of those components:
* {{< glossary_tooltip term_id="kube-controller-manager" text="kube-controller-manager" >}}
* {{< glossary_tooltip term_id="kube-proxy" text="kube-proxy" >}}
* {{< glossary_tooltip term_id="kube-apiserver" text="kube-apiserver" >}}
* {{< glossary_tooltip term_id="kube-scheduler" text="kube-scheduler" >}}
* {{< glossary_tooltip term_id="kubelet" text="kubelet" >}}
In a production environment you may want to configure [Prometheus Server](https://prometheus.io/) or some other metrics scraper
to periodically gather these metrics and make them available in some kind of time series database.
Note that {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} also exposes metrics in `/metrics/cadvisor`, `/metrics/resource` and `/metrics/probes` endpoints. Those metrics do not have same lifecycle.
If your cluster uses {{< glossary_tooltip term_id="rbac" text="RBAC" >}}, reading metrics requires authorization via a user, group or ServiceAccount with a ClusterRole that allows accessing `/metrics`.
For example:
```
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- nonResourceURLs:
- "/metrics"
verbs:
- get
```
## Metric lifecycle
Alpha metric → Stable metric → Deprecated metric → Hidden metric → Deletion
Alpha metrics have no stability guarantees; as such they can be modified or deleted at any time.
Stable metrics can be guaranteed to not change; Specifically, stability means:
* the metric itself will not be deleted (or renamed)
* the type of metric will not be modified
Deprecated metric signal that the metric will eventually be deleted; to find which version, you need to check annotation, which includes from which kubernetes version that metric will be considered deprecated.
Before deprecation:
```
# HELP some_counter this counts things
# TYPE some_counter counter
some_counter 0
```
After deprecation:
```
# HELP some_counter (Deprecated since 1.15.0) this counts things
# TYPE some_counter counter
some_counter 0
```
Once a metric is hidden then by default the metrics is not published for scraping. To use a hidden metric, you need to override the configuration for the relevant cluster component.
Once a metric is deleted, the metric is not published. You cannot change this using an override.
## Show Hidden Metrics
As described above, admins can enable hidden metrics through a command-line flag on a specific binary. This intends to be used as an escape hatch for admins if they missed the migration of the metrics deprecated in the last release.
The flag `show-hidden-metrics-for-version` takes a version for which you want to show metrics deprecated in that release. The version is expressed as x.y, where x is the major version, y is the minor version. The patch version is not needed even though a metrics can be deprecated in a patch release, the reason for that is the metrics deprecation policy runs against the minor release.
The flag can only take the previous minor version as it's value. All metrics hidden in previous will be emitted if admins set the previous version to `show-hidden-metrics-for-version`. The too old version is not allowed because this violates the metrics deprecated policy.
Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. According to metrics deprecated policy, we can reach the following conclusion:
* In release `1.n`, the metric is deprecated, and it can be emitted by default.
* In release `1.n+1`, the metric is hidden by default and it can be emitted by command line `show-hidden-metrics-for-version=1.n`.
* In release `1.n+2`, the metric should be removed from the codebase. No escape hatch anymore.
If you're upgrading from release `1.12` to `1.13`, but still depend on a metric `A` deprecated in `1.12`, you should set hidden metrics via command line: `--show-hidden-metrics=1.12` and remember to remove this metric dependency before upgrading to `1.14`
## Component metrics
### kube-controller-manager metrics
Controller manager metrics provide important insight into the performance and health of the controller manager.
These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as
etcd request latencies or Cloudprovider (AWS, GCE, OpenStack) API latencies that can be used
to gauge the health of a cluster.
Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and OpenStack.
These metrics can be used to monitor health of persistent volume operations.
For example, for GCE these metrics are called:
```
cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
```
{{% /capture %}}
{{% capture whatsnext %}}
* Read about the [Prometheus text format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format) for metrics
* See the list of [stable Kubernetes metrics](https://github.com/kubernetes/kubernetes/blob/master/test/instrumentation/testdata/stable-metrics-list.yaml)
* Read about the [Kubernetes deprecation policy](https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-a-feature-or-behavior )
{{% /capture %}}

View File

@ -17,7 +17,7 @@ There are several ways to do this, and the recommended approaches all use
[label selectors](/docs/concepts/overview/working-with-objects/labels/) to make the selection.
Generally such constraints are unnecessary, as the scheduler will automatically do a reasonable placement
(e.g. spread your pods across nodes, not place the pod on a node with insufficient free resources, etc.)
but there are some circumstances where you may want more control on a node where a pod lands, e.g. to ensure
but there are some circumstances where you may want more control on a node where a pod lands, for example to ensure
that a pod ends up on a machine with an SSD attached to it, or to co-locate pods from two different
services that communicate a lot into the same availability zone.
@ -111,9 +111,10 @@ For example, `example.com.node-restriction.kubernetes.io/fips=true` or `example.
`nodeSelector` provides a very simple way to constrain pods to nodes with particular labels. The affinity/anti-affinity
feature, greatly expands the types of constraints you can express. The key enhancements are
1. the language is more expressive (not just "AND or exact match")
1. The affinity/anti-affinity language is more expressive. The language offers more matching rules
besides exact matches created with a logical AND operation;
2. you can indicate that the rule is "soft"/"preference" rather than a hard requirement, so if the scheduler
can't satisfy it, the pod will still be scheduled
can't satisfy it, the pod will still be scheduled;
3. you can constrain against labels on other pods running on the node (or other topological domain),
rather than against labels on the node itself, which allows rules about which pods can and cannot be co-located
@ -159,9 +160,9 @@ You can use `NotIn` and `DoesNotExist` to achieve node anti-affinity behavior, o
If you specify both `nodeSelector` and `nodeAffinity`, *both* must be satisfied for the pod
to be scheduled onto a candidate node.
If you specify multiple `nodeSelectorTerms` associated with `nodeAffinity` types, then the pod can be scheduled onto a node **if one of** the `nodeSelectorTerms` is satisfied.
If you specify multiple `nodeSelectorTerms` associated with `nodeAffinity` types, then the pod can be scheduled onto a node **only if all** `nodeSelectorTerms` can be satisfied.
If you specify multiple `matchExpressions` associated with `nodeSelectorTerms`, then the pod can be scheduled onto a node **only if all** `matchExpressions` can be satisfied.
If you specify multiple `matchExpressions` associated with `nodeSelectorTerms`, then the pod can be scheduled onto a node **if one of** the `matchExpressions` is satisfied.
If you remove or change the label of the node where the pod is scheduled, the pod won't be removed. In other words, the affinity selection works only at the time of scheduling the pod.
@ -176,7 +177,7 @@ Y is expressed as a LabelSelector with an optional associated list of namespaces
(and therefore the labels on pods are implicitly namespaced),
a label selector over pod labels must specify which namespaces the selector should apply to. Conceptually X is a topology domain
like node, rack, cloud provider zone, cloud provider region, etc. You express it using a `topologyKey` which is the
key for the node label that the system uses to denote such a topology domain, e.g. see the label keys listed above
key for the node label that the system uses to denote such a topology domain; for example, see the label keys listed above
in the section [Interlude: built-in node labels](#built-in-node-labels).
{{< note >}}
@ -186,7 +187,7 @@ not recommend using them in clusters larger than several hundred nodes.
{{< /note >}}
{{< note >}}
Pod anti-affinity requires nodes to be consistently labelled, i.e. every node in the cluster must have an appropriate label matching `topologyKey`. If some or all nodes are missing the specified `topologyKey` label, it can lead to unintended behavior.
Pod anti-affinity requires nodes to be consistently labelled, in other words every node in the cluster must have an appropriate label matching `topologyKey`. If some or all nodes are missing the specified `topologyKey` label, it can lead to unintended behavior.
{{< /note >}}
As with node affinity, there are currently two types of pod affinity and anti-affinity, called `requiredDuringSchedulingIgnoredDuringExecution` and
@ -228,7 +229,7 @@ for performance and security reasons, there are some constraints on topologyKey:
1. For affinity and for `requiredDuringSchedulingIgnoredDuringExecution` pod anti-affinity,
empty `topologyKey` is not allowed.
2. For `requiredDuringSchedulingIgnoredDuringExecution` pod anti-affinity, the admission controller `LimitPodHardAntiAffinityTopology` was introduced to limit `topologyKey` to `kubernetes.io/hostname`. If you want to make it available for custom topologies, you may modify the admission controller, or simply disable it.
3. For `preferredDuringSchedulingIgnoredDuringExecution` pod anti-affinity, empty `topologyKey` is interpreted as "all topologies" ("all topologies" here is now limited to the combination of `kubernetes.io/hostname`, `failure-domain.beta.kubernetes.io/zone` and `failure-domain.beta.kubernetes.io/region`).
3. For `preferredDuringSchedulingIgnoredDuringExecution` pod anti-affinity, empty `topologyKey` is not allowed.
4. Except for the above cases, the `topologyKey` can be any legal label-key.
In addition to `labelSelector` and `topologyKey`, you can optionally specify a list `namespaces`
@ -318,7 +319,7 @@ spec:
topologyKey: "kubernetes.io/hostname"
containers:
- name: web-app
image: nginx:1.12-alpine
image: nginx:1.16-alpine
```
If we create the above two deployments, our three node cluster should look like below.
@ -366,7 +367,7 @@ Some of the limitations of using `nodeName` to select nodes are:
some cases may be automatically deleted.
- If the named node does not have the resources to accommodate the
pod, the pod will fail and its reason will indicate why,
e.g. OutOfmemory or OutOfcpu.
for example OutOfmemory or OutOfcpu.
- Node names in cloud environments are not always predictable or
stable.

View File

@ -68,13 +68,7 @@ resource requests/limits of that type for each Container in the Pod.
## Meaning of CPU
Limits and requests for CPU resources are measured in *cpu* units.
One cpu, in Kubernetes, is equivalent to:
- 1 AWS vCPU
- 1 GCP Core
- 1 Azure vCore
- 1 IBM vCPU
- 1 *Hyperthread* on a bare-metal Intel processor with Hyperthreading
One cpu, in Kubernetes, is equivalent to **1 vCPU/Core** for cloud providers and **1 hyperthread** on bare-metal Intel processors.
Fractional requests are allowed. A Container with
`spec.containers[].resources.requests.cpu` of `0.5` is guaranteed half as much
@ -191,9 +185,10 @@ resource limits, see the
The resource usage of a Pod is reported as part of the Pod status.
If [optional monitoring](http://releases.k8s.io/{{< param "githubbranch" >}}/cluster/addons/cluster-monitoring/README.md)
is configured for your cluster, then Pod resource usage can be retrieved from
the monitoring system.
If optional [tools for monitoring](/docs/tasks/debug-application-cluster/resource-usage-monitoring/)
are available in your cluster, then Pod resource usage can be retrieved either
from the [Metrics API](/docs/tasks/debug-application-cluster/resource-metrics-pipeline/#the-metrics-api)
directly or from your monitoring tools.
## Troubleshooting
@ -391,7 +386,7 @@ spec:
### How Pods with ephemeral-storage requests are scheduled
When you create a Pod, the Kubernetes scheduler selects a node for the Pod to
run on. Each node has a maximum amount of local ephemeral storage it can provide for Pods. For more information, see ["Node Allocatable"](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable).
run on. Each node has a maximum amount of local ephemeral storage it can provide for Pods. For more information, see ["Node Allocatable"](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable).
The scheduler ensures that the sum of the resource requests of the scheduled Containers is less than the capacity of the node.

View File

@ -30,7 +30,7 @@ This is a living document. If you think of something that is not on this list bu
- Put object descriptions in annotations, to allow better introspection.
## "Naked" Pods vs ReplicaSets, Deployments, and Jobs
## "Naked" Pods versus ReplicaSets, Deployments, and Jobs {#naked-pods-vs-replicasets-deployments-and-jobs}
- Don't use naked Pods (that is, Pods not bound to a [ReplicaSet](/docs/concepts/workloads/controllers/replicaset/) or [Deployment](/docs/concepts/workloads/controllers/deployment/)) if you can avoid it. Naked Pods will not be rescheduled in the event of a node failure.
@ -87,7 +87,7 @@ The [imagePullPolicy](/docs/concepts/containers/images/#updating-images) and the
- `imagePullPolicy: Never`: the image is assumed to exist locally. No attempt is made to pull the image.
{{< note >}}
To make sure the container always uses the same version of the image, you can specify its [digest](https://docs.docker.com/engine/reference/commandline/pull/#pull-an-image-by-digest-immutable-identifier), for example `sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2`. The digest uniquely identifies a specific version of the image, so it is never updated by Kubernetes unless you change the digest value.
To make sure the container always uses the same version of the image, you can specify its [digest](https://docs.docker.com/engine/reference/commandline/pull/#pull-an-image-by-digest-immutable-identifier); replace `<image-name>:<tag>` with `<image-name>@<digest>` (for example, `image@sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2`). The digest uniquely identifies a specific version of the image, so it is never updated by Kubernetes unless you change the digest value.
{{< /note >}}
{{< note >}}
@ -108,4 +108,3 @@ The caching semantics of the underlying image provider make even `imagePullPolic
{{% /capture %}}

View File

@ -10,12 +10,12 @@ weight: 20
{{% capture overview %}}
{{< feature-state for_k8s_version="v1.16" state="alpha" >}}
{{< feature-state for_k8s_version="v1.18" state="beta" >}}
When you run a Pod on a Node, the Pod itself takes an amount of system resources. These
resources are additional to the resources needed to run the container(s) inside the Pod.
_Pod Overhead_ is a feature for accounting for the resources consumed by the pod infrastructure
_Pod Overhead_ is a feature for accounting for the resources consumed by the Pod infrastructure
on top of the container requests & limits.
@ -24,33 +24,169 @@ on top of the container requests & limits.
{{% capture body %}}
## Pod Overhead
In Kubernetes, the pod's overhead is set at
In Kubernetes, the Pod's overhead is set at
[admission](/docs/reference/access-authn-authz/extensible-admission-controllers/#what-are-admission-webhooks)
time according to the overhead associated with the pod's
time according to the overhead associated with the Pod's
[RuntimeClass](/docs/concepts/containers/runtime-class/).
When Pod Overhead is enabled, the overhead is considered in addition to the sum of container
resource requests when scheduling a pod. Similarly, Kubelet will include the pod overhead when sizing
the pod cgroup, and when carrying out pod eviction ranking.
resource requests when scheduling a Pod. Similarly, Kubelet will include the Pod overhead when sizing
the Pod cgroup, and when carrying out Pod eviction ranking.
### Set Up
## Enabling Pod Overhead {#set-up}
You need to make sure that the `PodOverhead`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled (it is off by default)
across your cluster. This means:
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled (it is on by default as of 1.18)
across your cluster, and a `RuntimeClass` is utilized which defines the `overhead` field.
- in {{< glossary_tooltip text="kube-scheduler" term_id="kube-scheduler" >}}
- in {{< glossary_tooltip text="kube-apiserver" term_id="kube-apiserver" >}}
- in the {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} on each Node
- in any custom API servers that use feature gates
## Usage example
{{< note >}}
Users who can write to RuntimeClass resources are able to have cluster-wide impact on
workload performance. You can limit access to this ability using Kubernetes access controls.
See [Authorization Overview](/docs/reference/access-authn-authz/authorization/) for more details.
{{< /note >}}
To use the PodOverhead feature, you need a RuntimeClass that defines the `overhead` field. As
an example, you could use the following RuntimeClass definition with a virtualizing container runtime
that uses around 120MiB per Pod for the virtual machine and the guest OS:
```yaml
---
kind: RuntimeClass
apiVersion: node.k8s.io/v1beta1
metadata:
name: kata-fc
handler: kata-fc
overhead:
podFixed:
memory: "120Mi"
cpu: "250m"
```
Workloads which are created which specify the `kata-fc` RuntimeClass handler will take the memory and
cpu overheads into account for resource quota calculations, node scheduling, as well as Pod cgroup sizing.
Consider running the given example workload, test-pod:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
runtimeClassName: kata-fc
containers:
- name: busybox-ctr
image: busybox
stdin: true
tty: true
resources:
limits:
cpu: 500m
memory: 100Mi
- name: nginx-ctr
image: nginx
resources:
limits:
cpu: 1500m
memory: 100Mi
```
At admission time the RuntimeClass [admission controller](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/)
updates the workload's PodSpec to include the `overhead` as described in the RuntimeClass. If the PodSpec already has this field defined,
the Pod will be rejected. In the given example, since only the RuntimeClass name is specified, the admission controller mutates the Pod
to include an `overhead`.
After the RuntimeClass admission controller, you can check the updated PodSpec:
```bash
kubectl get pod test-pod -o jsonpath='{.spec.overhead}'
```
The output is:
```
map[cpu:250m memory:120Mi]
```
If a ResourceQuota is defined, the sum of container requests as well as the
`overhead` field are counted.
When the kube-scheduler is deciding which node should run a new Pod, the scheduler considers that Pod's
`overhead` as well as the sum of container requests for that Pod. For this example, the scheduler adds the
requests and the overhead, then looks for a node that has 2.25 CPU and 320 MiB of memory available.
Once a Pod is scheduled to a node, the kubelet on that node creates a new {{< glossary_tooltip text="cgroup" term_id="cgroup" >}}
for the Pod. It is within this pod that the underlying container runtime will create containers.
If the resource has a limit defined for each container (Guaranteed QoS or Bustrable QoS with limits defined),
the kubelet will set an upper limit for the pod cgroup associated with that resource (cpu.cfs_quota_us for CPU
and memory.limit_in_bytes memory). This upper limit is based on the sum of the container limits plus the `overhead`
defined in the PodSpec.
For CPU, if the Pod is Guaranteed or Burstable QoS, the kubelet will set `cpu.shares` based on the sum of container
requests plus the `overhead` defined in the PodSpec.
Looking at our example, verify the container requests for the workload:
```bash
kubectl get pod test-pod -o jsonpath='{.spec.containers[*].resources.limits}'
```
The total container requests are 2000m CPU and 200MiB of memory:
```
map[cpu: 500m memory:100Mi] map[cpu:1500m memory:100Mi]
```
Check this against what is observed by the node:
```bash
kubectl describe node | grep test-pod -B2
```
The output shows 2250m CPU and 320MiB of memory are requested, which includes PodOverhead:
```
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default test-pod 2250m (56%) 2250m (56%) 320Mi (1%) 320Mi (1%) 36m
```
## Verify Pod cgroup limits
Check the Pod's memory cgroups on the node where the workload is running. In the following example, [`crictl`](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md)
is used on the node, which provides a CLI for CRI-compatible container runtimes. This is an
advanced example to show PodOverhead behavior, and it is not expected that users should need to check
cgroups directly on the node.
First, on the particular node, determine the Pod identifier:
```bash
# Run this on the node where the Pod is scheduled
POD_ID="$(sudo crictl pods --name test-pod -q)"
```
From this, you can determine the cgroup path for the Pod:
```bash
# Run this on the node where the Pod is scheduled
sudo crictl inspectp -o=json $POD_ID | grep cgroupsPath
```
The resulting cgroup path includes the Pod's `pause` container. The Pod level cgroup is one directory above.
```
"cgroupsPath": "/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/7ccf55aee35dd16aca4189c952d83487297f3cd760f1bbf09620e206e7d0c27a"
```
In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2`. Verify the Pod level cgroup setting for memory:
```bash
# Run this on the node where the Pod is scheduled.
# Also, change the name of the cgroup to match the cgroup allocated for your pod.
cat /sys/fs/cgroup/memory/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/memory.limit_in_bytes
```
This is 320 MiB, as expected:
```
335544320
```
### Observability
A `kube_pod_overhead` metric is available in [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
to help identify when PodOverhead is being utilized and to help observe stability of workloads
running with a defined Overhead. This functionality is not available in the 1.9 release of
kube-state-metrics, but is expected in a following release. Users will need to build kube-state-metrics
from source in the meantime.
{{% /capture %}}

View File

@ -16,42 +16,25 @@ importance of a Pod relative to other Pods. If a Pod cannot be scheduled, the
scheduler tries to preempt (evict) lower priority Pods to make scheduling of the
pending Pod possible.
In Kubernetes 1.9 and later, Priority also affects scheduling order of Pods and
out-of-resource eviction ordering on the Node.
Pod priority and preemption graduated to beta in Kubernetes 1.11 and to GA in
Kubernetes 1.14. They have been enabled by default since 1.11.
In Kubernetes versions where Pod priority and preemption is still an alpha-level
feature, you need to explicitly enable it. To use these features in the older
versions of Kubernetes, follow the instructions in the documentation for your
Kubernetes version, by going to the documentation archive version for your
Kubernetes version.
Kubernetes Version | Priority and Preemption State | Enabled by default
------------------ | :---------------------------: | :----------------:
1.8 | alpha | no
1.9 | alpha | no
1.10 | alpha | no
1.11 | beta | yes
1.14 | stable | yes
{{< warning >}}In a cluster where not all users are trusted, a
malicious user could create pods at the highest possible priorities, causing
other pods to be evicted/not get scheduled. To resolve this issue,
[ResourceQuota](/docs/concepts/policy/resource-quotas/) is
augmented to support Pod priority. An admin can create ResourceQuota for users
at specific priority levels, preventing them from creating pods at high
priorities. This feature is in beta since Kubernetes 1.12.
{{< /warning >}}
{{% /capture %}}
{{% capture body %}}
{{< warning >}}
In a cluster where not all users are trusted, a malicious user could create Pods
at the highest possible priorities, causing other Pods to be evicted/not get
scheduled.
An administrator can use ResourceQuota to prevent users from creating pods at
high priorities.
See [limit Priority Class consumption by default](/docs/concepts/policy/resource-quotas/#limit-priority-class-consumption-by-default)
for details.
{{< /warning >}}
## How to use priority and preemption
To use priority and preemption in Kubernetes 1.11 and later, follow these steps:
To use priority and preemption:
1. Add one or more [PriorityClasses](#priorityclass).
@ -62,6 +45,12 @@ To use priority and preemption in Kubernetes 1.11 and later, follow these steps:
Keep reading for more information about these steps.
{{< note >}}
Kubernetes already ships with two PriorityClasses:
`system-cluster-critical` and `system-node-critical`.
These are common classes and are used to [ensure that critical components are always scheduled first](/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/).
{{< /note >}}
If you try the feature and then decide to disable it, you must remove the
PodPriority command-line flag or set it to `false`, and then restart the API
server and scheduler. After the feature is disabled, the existing Pods keep
@ -71,21 +60,20 @@ Pods.
## How to disable preemption
{{< note >}}
In Kubernetes 1.12+, critical pods rely on scheduler preemption to be scheduled
when a cluster is under resource pressure. For this reason, it is not
recommended to disable preemption.
{{< /note >}}
{{< caution >}}
Critical pods rely on scheduler preemption to be scheduled when a cluster
is under resource pressure. For this reason, it is not recommended to
disable preemption.
{{< /caution >}}
{{< note >}}
In Kubernetes 1.15 and later,
if the feature `NonPreemptingPriority` is enabled,
In Kubernetes 1.15 and later, if the feature `NonPreemptingPriority` is enabled,
PriorityClasses have the option to set `preemptionPolicy: Never`.
This will prevent pods of that PriorityClass from preempting other pods.
{{< /note >}}
In Kubernetes 1.11 and later, preemption is controlled by a kube-scheduler flag
`disablePreemption`, which is set to `false` by default.
Preemption is controlled by a kube-scheduler flag `disablePreemption`, which is
set to `false` by default.
If you want to disable preemption despite the above note, you can set
`disablePreemption` to `true`.
@ -111,6 +99,9 @@ priority class name to the integer value of the priority. The name is specified
in the `name` field of the PriorityClass object's metadata. The value is
specified in the required `value` field. The higher the value, the higher the
priority.
The name of a PriorityClass object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names),
and it cannot be prefixed with `system-`.
A PriorityClass object can have any 32-bit integer value smaller than or equal
to 1 billion. Larger numbers are reserved for critical system Pods that should
@ -152,12 +143,9 @@ globalDefault: false
description: "This priority class should be used for XYZ service pods only."
```
### Non-preempting PriorityClasses (alpha) {#non-preempting-priority-class}
## Non-preempting PriorityClass {#non-preempting-priority-class}
1.15 adds the `PreemptionPolicy` field as an alpha feature.
It is disabled by default in 1.15,
and requires the `NonPreemptingPriority`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/
) to be enabled.
{{< feature-state for_k8s_version="1.15" state="alpha" >}}
Pods with `PreemptionPolicy: Never` will be placed in the scheduling queue
ahead of lower-priority pods,
@ -181,6 +169,10 @@ which will allow pods of that PriorityClass to preempt lower-priority pods
If `PreemptionPolicy` is set to `Never`,
pods in that PriorityClass will be non-preempting.
The use of the `PreemptionPolicy` field requires the `NonPreemptingPriority`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
to be enabled.
An example use case is for data science workloads.
A user may submit a job that they want to be prioritized above other workloads,
but do not wish to discard existing work by preempting running pods.
@ -188,7 +180,7 @@ The high priority job with `PreemptionPolicy: Never` will be scheduled
ahead of other queued pods,
as soon as sufficient cluster resources "naturally" become free.
#### Example Non-preempting PriorityClass
### Example Non-preempting PriorityClass
```yaml
apiVersion: scheduling.k8s.io/v1
@ -230,12 +222,12 @@ spec:
### Effect of Pod priority on scheduling order
In Kubernetes 1.9 and later, when Pod priority is enabled, scheduler orders
pending Pods by their priority and a pending Pod is placed ahead of other
pending Pods with lower priority in the scheduling queue. As a result, the
higher priority Pod may be scheduled sooner than Pods with lower priority if its
scheduling requirements are met. If such Pod cannot be scheduled, scheduler will
continue and tries to schedule other lower priority Pods.
When Pod priority is enabled, the scheduler orders pending Pods by
their priority and a pending Pod is placed ahead of other pending Pods
with lower priority in the scheduling queue. As a result, the higher
priority Pod may be scheduled sooner than Pods with lower priority if
its scheduling requirements are met. If such Pod cannot be scheduled,
scheduler will continue and tries to schedule other lower priority Pods.
## Preemption
@ -281,12 +273,12 @@ point that scheduler preempts victims and the time that Pod P is scheduled. In
order to minimize this gap, one can set graceful termination period of lower
priority Pods to zero or a small number.
#### PodDisruptionBudget is supported, but not guaranteed!
#### PodDisruptionBudget is supported, but not guaranteed
A [Pod Disruption Budget (PDB)](/docs/concepts/workloads/pods/disruptions/)
allows application owners to limit the number of Pods of a replicated application
that are down simultaneously from voluntary disruptions. Kubernetes 1.9 supports
PDB when preempting Pods, but respecting PDB is best effort. The Scheduler tries
that are down simultaneously from voluntary disruptions. Kubernetes supports
PDB when preempting Pods, but respecting PDB is best effort. The scheduler tries
to find victims whose PDB are not violated by preemption, but if no such victims
are found, preemption will still happen, and lower priority Pods will be removed
despite their PDBs being violated.
@ -337,28 +329,23 @@ gone, and Pod P could possibly be scheduled on Node N.
We may consider adding cross Node preemption in future versions if there is
enough demand and if we find an algorithm with reasonable performance.
## Debugging Pod Priority and Preemption
## Troubleshooting
Pod Priority and Preemption is a major feature that could potentially disrupt
Pod scheduling if it has bugs.
Pod priority and pre-emption can have unwanted side effects. Here are some
examples of potential problems and ways to deal with them.
### Potential problems caused by Priority and Preemption
The followings are some of the potential problems that could be caused by bugs
in the implementation of the feature. This list is not exhaustive.
#### Pods are preempted unnecessarily
### Pods are preempted unnecessarily
Preemption removes existing Pods from a cluster under resource pressure to make
room for higher priority pending Pods. If a user gives high priorities to
certain Pods by mistake, these unintentional high priority Pods may cause
preemption in the cluster. As mentioned above, Pod priority is specified by
setting the `priorityClassName` field of `podSpec`. The integer value of
room for higher priority pending Pods. If you give high priorities to
certain Pods by mistake, these unintentionally high priority Pods may cause
preemption in your cluster. Pod priority is specified by setting the
`priorityClassName` field in the Pod's specification. The integer value for
priority is then resolved and populated to the `priority` field of `podSpec`.
To resolve the problem, `priorityClassName` of the Pods must be changed to use
lower priority classes or should be left empty. Empty `priorityClassName` is
resolved to zero by default.
To address the problem, you can change the `priorityClassName` for those Pods
to use lower priority classes, or leave that field empty. An empty
`priorityClassName` is resolved to zero by default.
When a Pod is preempted, there will be events recorded for the preempted Pod.
Preemption should happen only when a cluster does not have enough resources for
@ -367,29 +354,31 @@ Pod (preemptor) is higher than the victim Pods. Preemption must not happen when
there is no pending Pod, or when the pending Pods have equal or lower priority
than the victims. If preemption happens in such scenarios, please file an issue.
#### Pods are preempted, but the preemptor is not scheduled
### Pods are preempted, but the preemptor is not scheduled
When pods are preempted, they receive their requested graceful termination
period, which is by default 30 seconds, but it can be any different value as
specified in the PodSpec. If the victim Pods do not terminate within this period,
they are force-terminated. Once all the victims go away, the preemptor Pod can
be scheduled.
period, which is by default 30 seconds. If the victim Pods do not terminate within
this period, they are forcibly terminated. Once all the victims go away, the
preemptor Pod can be scheduled.
While the preemptor Pod is waiting for the victims to go away, a higher priority
Pod may be created that fits on the same node. In this case, the scheduler will
Pod may be created that fits on the same Node. In this case, the scheduler will
schedule the higher priority Pod instead of the preemptor.
In the absence of such a higher priority Pod, we expect the preemptor Pod to be
scheduled after the graceful termination period of the victims is over.
This is expected behavior: the Pod with the higher priority should take the place
of a Pod with a lower priority. Other controller actions, such as
[cluster autoscaling](/docs/tasks/administer-cluster/cluster-management/#cluster-autoscaling),
may eventually provide capacity to schedule the pending Pods.
#### Higher priority Pods are preempted before lower priority pods
### Higher priority Pods are preempted before lower priority pods
The scheduler tries to find nodes that can run a pending Pod and if no node is
found, it tries to remove Pods with lower priority from one node to make room
for the pending pod. If a node with low priority Pods is not feasible to run the
pending Pod, the scheduler may choose another node with higher priority Pods
(compared to the Pods on the other node) for preemption. The victims must still
have lower priority than the preemptor Pod.
The scheduler tries to find nodes that can run a pending Pod. If no node is
found, the scheduler tries to remove Pods with lower priority from an arbitrary
node in order to make room for the pending pod.
If a node with low priority Pods is not feasible to run the pending Pod, the scheduler
may choose another node with higher priority Pods (compared to the Pods on the
other node) for preemption. The victims must still have lower priority than the
preemptor Pod.
When there are multiple nodes available for preemption, the scheduler tries to
choose the node with a set of Pods with lowest priority. However, if such Pods
@ -397,13 +386,11 @@ have PodDisruptionBudget that would be violated if they are preempted then the
scheduler may choose another node with higher priority Pods.
When multiple nodes exist for preemption and none of the above scenarios apply,
we expect the scheduler to choose a node with the lowest priority. If that is
not the case, it may indicate a bug in the scheduler.
the scheduler chooses a node with the lowest priority.
## Interactions of Pod priority and QoS
## Interactions between Pod priority and quality of service {#interactions-of-pod-priority-and-qos}
Pod priority and
[QoS](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/resource-qos.md)
Pod priority and {{< glossary_tooltip text="QoS class" term_id="qos-class" >}}
are two orthogonal features with few interactions and no default restrictions on
setting the priority of a Pod based on its QoS classes. The scheduler's
preemption logic does not consider QoS when choosing preemption targets.
@ -414,15 +401,20 @@ to schedule the preemptor Pod, or if the lowest priority Pods are protected by
`PodDisruptionBudget`.
The only component that considers both QoS and Pod priority is
[Kubelet out-of-resource eviction](/docs/tasks/administer-cluster/out-of-resource/).
[kubelet out-of-resource eviction](/docs/tasks/administer-cluster/out-of-resource/).
The kubelet ranks Pods for eviction first by whether or not their usage of the
starved resource exceeds requests, then by Priority, and then by the consumption
of the starved compute resource relative to the Pods scheduling requests.
See
[Evicting end-user pods](/docs/tasks/administer-cluster/out-of-resource/#evicting-end-user-pods)
for more details. Kubelet out-of-resource eviction does not evict Pods whose
[evicting end-user pods](/docs/tasks/administer-cluster/out-of-resource/#evicting-end-user-pods)
for more details.
kubelet out-of-resource eviction does not evict Pods wheir their
usage does not exceed their requests. If a Pod with lower priority is not
exceeding its requests, it won't be evicted. Another Pod with higher priority
that exceeds its requests may be evicted.
{{% /capture %}}
{{% capture whatsnext %}}
* Read about using ResourceQuotas in connection with PriorityClasses: [limit Priority Class consumption by default](/docs/concepts/policy/resource-quotas/#limit-priority-class-consumption-by-default)
{{% /capture %}}

File diff suppressed because it is too large Load Diff

View File

@ -197,11 +197,13 @@ on the special hardware nodes. This will make sure that these special hardware
nodes are dedicated for pods requesting such hardware and you don't have to
manually add tolerations to your pods.
* **Taint based Evictions (beta feature)**: A per-pod-configurable eviction behavior
* **Taint based Evictions**: A per-pod-configurable eviction behavior
when there are node problems, which is described in the next section.
## Taint based Evictions
{{< feature-state for_k8s_version="1.18" state="stable" >}}
Earlier we mentioned the `NoExecute` taint effect, which affects pods that are already
running on the node as follows
@ -229,9 +231,9 @@ certain condition is true. The following taints are built in:
as unusable. After a controller from the cloud-controller-manager initializes
this node, the kubelet removes this taint.
In version 1.13, the `TaintBasedEvictions` feature is promoted to beta and enabled by default, hence the taints are automatically
added by the NodeController (or kubelet) and the normal logic for evicting pods from nodes
based on the Ready NodeCondition is disabled.
In case a node is to be evicted, the node controller or the kubelet adds relevant taints
with `NoExecute` effect. If the fault condition returns to normal the kubelet or node
controller can remove the relevant taint(s).
{{< note >}}
To maintain the existing [rate limiting](/docs/concepts/architecture/nodes/)
@ -240,7 +242,7 @@ in a rate-limited way. This prevents massive pod evictions in scenarios such
as the master becoming partitioned from the nodes.
{{< /note >}}
This beta feature, in combination with `tolerationSeconds`, allows a pod
The feature, in combination with `tolerationSeconds`, allows a pod
to specify how long it should stay bound to a node that has one or both of these problems.
For example, an application with a lot of local state might want to stay
@ -277,15 +279,13 @@ admission controller](https://git.k8s.io/kubernetes/plugin/pkg/admission/default
* `node.kubernetes.io/unreachable`
* `node.kubernetes.io/not-ready`
This ensures that DaemonSet pods are never evicted due to these problems,
which matches the behavior when this feature is disabled.
This ensures that DaemonSet pods are never evicted due to these problems.
## Taint Nodes by Condition
The node lifecycle controller automatically creates taints corresponding to
Node conditions.
Node conditions with `NoSchedule` effect.
Similarly the scheduler does not check Node conditions; instead the scheduler checks taints. This assures that Node conditions don't affect what's scheduled onto the Node. The user can choose to ignore some of the Node's problems (represented as Node conditions) by adding appropriate Pod tolerations.
Note that `TaintNodesByCondition` only taints nodes with `NoSchedule` effect. `NoExecute` effect is controlled by `TaintBasedEviction` which is a beta feature and enabled by default since version 1.13.
Starting in Kubernetes 1.8, the DaemonSet controller automatically adds the
following `NoSchedule` tolerations to all daemons, to prevent DaemonSets from

View File

@ -2,7 +2,7 @@
reviewers:
- mikedanese
- thockin
title: Container Environment Variables
title: Container Environment
content_template: templates/concept
weight: 20
---

View File

@ -116,7 +116,7 @@ Events:
{{% capture whatsnext %}}
* Learn more about the [Container environment](/docs/concepts/containers/container-environment-variables/).
* Learn more about the [Container environment](/docs/concepts/containers/container-environment/).
* Get hands-on experience
[attaching handlers to Container lifecycle events](/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/).

View File

@ -67,6 +67,7 @@ Credentials can be provided in several ways:
- use IAM roles and policies to control access to OCIR repositories
- Using Azure Container Registry (ACR)
- Using IBM Cloud Container Registry
- use IAM roles and policies to grant access to IBM Cloud Container Registry
- Configuring Nodes to Authenticate to a Private Registry
- all pods can read any configured private registries
- requires node configuration by cluster administrator
@ -148,11 +149,11 @@ Once you have those variables filled in you can
[configure a Kubernetes Secret and use it to deploy a Pod](/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod).
### Using IBM Cloud Container Registry
IBM Cloud Container Registry provides a multi-tenant private image registry that you can use to safely store and share your Docker images. By default, images in your private registry are scanned by the integrated Vulnerability Advisor to detect security issues and potential vulnerabilities. Users in your IBM Cloud account can access your images, or you can create a token to grant access to registry namespaces.
IBM Cloud Container Registry provides a multi-tenant private image registry that you can use to safely store and share your images. By default, images in your private registry are scanned by the integrated Vulnerability Advisor to detect security issues and potential vulnerabilities. Users in your IBM Cloud account can access your images, or you can use IAM roles and policies to grant access to IBM Cloud Container Registry namespaces.
To install the IBM Cloud Container Registry CLI plug-in and create a namespace for your images, see [Getting started with IBM Cloud Container Registry](https://cloud.ibm.com/docs/services/Registry?topic=registry-getting-started).
To install the IBM Cloud Container Registry CLI plug-in and create a namespace for your images, see [Getting started with IBM Cloud Container Registry](https://cloud.ibm.com/docs/Registry?topic=registry-getting-started).
You can use the IBM Cloud Container Registry to deploy containers from [IBM Cloud public images](https://cloud.ibm.com/docs/services/Registry?topic=registry-public_images) and your private images into the `default` namespace of your IBM Cloud Kubernetes Service cluster. To deploy a container into other namespaces, or to use an image from a different IBM Cloud Container Registry region or IBM Cloud account, create a Kubernetes `imagePullSecret`. For more information, see [Building containers from images](https://cloud.ibm.com/docs/containers?topic=containers-images).
If you are using the same account and region, you can deploy images that are stored in IBM Cloud Container Registry into the default namespace of your IBM Cloud Kubernetes Service cluster without any additional configuration, see [Building containers from images](https://cloud.ibm.com/docs/containers?topic=containers-images). For other configuration options, see [Understanding how to authorize your cluster to pull images from a registry](https://cloud.ibm.com/docs/containers?topic=containers-registry#cluster_registry_auth).
### Configuring Nodes to Authenticate to a Private Registry

View File

@ -0,0 +1,45 @@
---
reviewers:
- erictune
- thockin
title: Containers overview
content_template: templates/concept
weight: 1
---
{{% capture overview %}}
Containers are a technnology for packaging the (compiled) code for an
application along with the dependencies it needs at run time. Each
container that you run is repeatable; the standardisation from having
dependencies included means that you get the same behavior wherever you
run it.
Containers decouple applications from underlying host infrastructure.
This makes deployment easier in different cloud or OS environments.
{{% /capture %}}
{{% capture body %}}
## Container images
A [container image](/docs/concepts/containers/images/) is a ready-to-run
software package, containing everything needed to run an application:
the code and any runtime it requires, application and system libraries,
and default values for any essential settings.
By design, a container is immutable: you cannot change the code of a
container that is already running. If you have a containerized application
and want to make changes, you need to build a new container that includes
the change, then recreate the container to start from the updated image.
## Container runtimes
{{< glossary_definition term_id="container-runtime" length="all" >}}
{{% /capture %}}
{{% capture whatsnext %}}
* Read about [container images](/docs/concepts/containers/images/)
* Read about [Pods](/docs/concepts/workloads/pods/)
{{% /capture %}}

View File

@ -13,22 +13,14 @@ weight: 20
This page describes the RuntimeClass resource and runtime selection mechanism.
{{< warning >}}
RuntimeClass includes *breaking* changes in the beta upgrade in v1.14. If you were using
RuntimeClass prior to v1.14, see [Upgrading RuntimeClass from Alpha to
Beta](#upgrading-runtimeclass-from-alpha-to-beta).
{{< /warning >}}
RuntimeClass is a feature for selecting the container runtime configuration. The container runtime
configuration is used to run a Pod's containers.
{{% /capture %}}
{{% capture body %}}
## Runtime Class
RuntimeClass is a feature for selecting the container runtime configuration. The container runtime
configuration is used to run a Pod's containers.
## Motivation
You can set a different RuntimeClass between different Pods to provide a balance of
@ -41,7 +33,7 @@ additional overhead.
You can also use RuntimeClass to run different Pods with the same container runtime
but with different settings.
### Set Up
## Setup
Ensure the RuntimeClass feature gate is enabled (it is by default). See [Feature
Gates](/docs/reference/command-line-tools-reference/feature-gates/) for an explanation of enabling
@ -50,7 +42,7 @@ feature gates. The `RuntimeClass` feature gate must be enabled on apiservers _an
1. Configure the CRI implementation on nodes (runtime dependent)
2. Create the corresponding RuntimeClass resources
#### 1. Configure the CRI implementation on nodes
### 1. Configure the CRI implementation on nodes
The configurations available through RuntimeClass are Container Runtime Interface (CRI)
implementation dependent. See the corresponding documentation ([below](#cri-configuration)) for your
@ -65,7 +57,7 @@ heterogenous node configurations, see [Scheduling](#scheduling) below.
The configurations have a corresponding `handler` name, referenced by the RuntimeClass. The
handler must be a valid DNS 1123 label (alpha-numeric + `-` characters).
#### 2. Create the corresponding RuntimeClass resources
### 2. Create the corresponding RuntimeClass resources
The configurations setup in step 1 should each have an associated `handler` name, which identifies
the configuration. For each handler, create a corresponding RuntimeClass object.
@ -82,13 +74,16 @@ metadata:
handler: myconfiguration # The name of the corresponding CRI configuration
```
The name of a RuntimeClass object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
{{< note >}}
It is recommended that RuntimeClass write operations (create/update/patch/delete) be
restricted to the cluster administrator. This is typically the default. See [Authorization
Overview](/docs/reference/access-authn-authz/authorization/) for more details.
{{< /note >}}
### Usage
## Usage
Once RuntimeClasses are configured for the cluster, using them is very simple. Specify a
`runtimeClassName` in the Pod spec. For example:
@ -147,14 +142,14 @@ See CRI-O's [config documentation][100] for more details.
[100]: https://raw.githubusercontent.com/cri-o/cri-o/9f11d1d/docs/crio.conf.5.md
### Scheduling
## Scheduling
{{< feature-state for_k8s_version="v1.16" state="beta" >}}
As of Kubernetes v1.16, RuntimeClass includes support for heterogenous clusters through its
`scheduling` fields. Through the use of these fields, you can ensure that pods running with this
RuntimeClass are scheduled to nodes that support it. To use the scheduling support, you must have
the RuntimeClass [admission controller][] enabled (the default, as of 1.16).
the [RuntimeClass admission controller][] enabled (the default, as of 1.16).
To ensure pods land on nodes supporting a specific RuntimeClass, that set of nodes should have a
common label which is then selected by the `runtimeclass.scheduling.nodeSelector` field. The
@ -170,50 +165,23 @@ by each.
To learn more about configuring the node selector and tolerations, see [Assigning Pods to
Nodes](/docs/concepts/configuration/assign-pod-node/).
[admission controller]: /docs/reference/access-authn-authz/admission-controllers/
[RuntimeClass admission controller]: /docs/reference/access-authn-authz/admission-controllers/#runtimeclass
### Pod Overhead
{{< feature-state for_k8s_version="v1.16" state="alpha" >}}
{{< feature-state for_k8s_version="v1.18" state="beta" >}}
As of Kubernetes v1.16, RuntimeClass includes support for specifying overhead associated with
running a pod, as part of the [`PodOverhead`](/docs/concepts/configuration/pod-overhead/) feature.
To use `PodOverhead`, you must have the PodOverhead [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
enabled (it is off by default).
You can specify _overhead_ resources that are associated with running a Pod. Declaring overhead allows
the cluster (including the scheduler) to account for it when making decisions about Pods and resources.
To use Pod overhead, you must have the PodOverhead [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
enabled (it is on by default).
Pod overhead is defined in RuntimeClass through the `Overhead` fields. Through the use of these fields,
Pod overhead is defined in RuntimeClass through the `overhead` fields. Through the use of these fields,
you can specify the overhead of running pods utilizing this RuntimeClass and ensure these overheads
are accounted for in Kubernetes.
### Upgrading RuntimeClass from Alpha to Beta
The RuntimeClass Beta feature includes the following changes:
- The `node.k8s.io` API group and `runtimeclasses.node.k8s.io` resource have been migrated to a
built-in API from a CustomResourceDefinition.
- The `spec` has been inlined in the RuntimeClass definition (i.e. there is no more
RuntimeClassSpec).
- The `runtimeHandler` field has been renamed `handler`.
- The `handler` field is now required in all API versions. This means the `runtimeHandler` field in
the Alpha API is also required.
- The `handler` field must be a valid DNS label ([RFC 1123](https://tools.ietf.org/html/rfc1123)),
meaning it can no longer contain `.` characters (in all versions). Valid handlers match the
following regular expression: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`.
**Action Required:** The following actions are required to upgrade from the alpha version of the
RuntimeClass feature to the beta version:
- RuntimeClass resources must be recreated *after* upgrading to v1.14, and the
`runtimeclasses.node.k8s.io` CRD should be manually deleted:
```
kubectl delete customresourcedefinitions.apiextensions.k8s.io runtimeclasses.node.k8s.io
```
- Alpha RuntimeClasses with an unspecified or empty `runtimeHandler` or those using a `.` character
in the handler are no longer valid, and must be migrated to a valid handler configuration (see
above).
### Further Reading
{{% /capture %}}
{{% capture whatsnext %}}
- [RuntimeClass Design](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/runtime-class.md)
- [RuntimeClass Scheduling Design](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/runtime-class-scheduling.md)

View File

@ -5,30 +5,34 @@ reviewers:
- cheftako
- chenopis
content_template: templates/concept
weight: 10
weight: 20
---
{{% capture overview %}}
The aggregation layer allows Kubernetes to be extended with additional APIs, beyond what is offered by the core Kubernetes APIs.
The aggregation layer allows Kubernetes to be extended with additional APIs, beyond what is offered by the core Kubernetes APIs.
The additional APIs can either be ready-made solutions such as [service-catalog](/docs/concepts/extend-kubernetes/service-catalog/), or APIs that you develop yourself.
The aggregation layer is different from [Custom Resources](/docs/concepts/extend-kubernetes/api-extension/custom-resources/), which are a way to make the {{< glossary_tooltip term_id="kube-apiserver" text="kube-apiserver" >}} recognise new kinds of object.
{{% /capture %}}
{{% capture body %}}
## Overview
## Aggregation layer
The aggregation layer enables installing additional Kubernetes-style APIs in your cluster. These can either be pre-built, existing 3rd party solutions, such as [service-catalog](https://github.com/kubernetes-incubator/service-catalog/blob/master/README.md), or user-created APIs like [apiserver-builder](https://github.com/kubernetes-incubator/apiserver-builder/blob/master/README.md), which can get you started.
The aggregation layer runs in-process with the kube-apiserver. Until an extension resource is registered, the aggregation layer will do nothing. To register an API, you add an _APIService_ object, which "claims" the URL path in the Kubernetes API. At that point, the aggregation layer will proxy anything sent to that API path (e.g. `/apis/myextension.mycompany.io/v1/…`) to the registered APIService.
The aggregation layer runs in-process with the kube-apiserver. Until an extension resource is registered, the aggregation layer will do nothing. To register an API, users must add an APIService object, which "claims" the URL path in the Kubernetes API. At that point, the aggregation layer will proxy anything sent to that API path (e.g. /apis/myextension.mycompany.io/v1/…) to the registered APIService.
The most common way to implement the APIService is to run an *extension API server* in Pod(s) that run in your cluster. If you're using the extension API server to manage resources in your cluster, the extension API server (also written as "extension-apiserver") is typically paired with one or more {{< glossary_tooltip text="controllers" term_id="controller" >}}. The apiserver-builder library provides a skeleton for both extension API servers and the associated controller(s).
Ordinarily, the APIService will be implemented by an *extension-apiserver* in a pod running in the cluster. This extension-apiserver will normally need to be paired with one or more controllers if active management of the added resources is needed. As a result, the apiserver-builder will actually provide a skeleton for both. As another example, when the service-catalog is installed, it provides both the extension-apiserver and controller for the services it provides.
### Response latency
Extension-apiservers should have low latency connections to and from the kube-apiserver.
In particular, discovery requests are required to round-trip from the kube-apiserver in five seconds or less.
If your deployment cannot achieve this, you should consider how to change it. For now, setting the
`EnableAggregatedDiscoveryTimeout=false` feature gate on the kube-apiserver
will disable the timeout restriction. It will be removed in a future release.
Extension API servers should have low latency networking to and from the kube-apiserver.
Discovery requests are required to round-trip from the kube-apiserver in five seconds or less.
If your extension API server cannot achieve that latency requirement, consider making changes that let you meet it. You can also set the
`EnableAggregatedDiscoveryTimeout=false` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) on the kube-apiserver
to disable the timeout restriction. This deprecated feature gate will be removed in a future release.
{{% /capture %}}
@ -37,7 +41,6 @@ will disable the timeout restriction. It will be removed in a future release.
* To get the aggregator working in your environment, [configure the aggregation layer](/docs/tasks/access-kubernetes-api/configure-aggregation-layer/).
* Then, [setup an extension api-server](/docs/tasks/access-kubernetes-api/setup-extension-api-server/) to work with the aggregation layer.
* Also, learn how to [extend the Kubernetes API using Custom Resource Definitions](/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/).
* Read the specification for [APIService](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#apiservice-v1-apiregistration-k8s-io)
{{% /capture %}}

View File

@ -4,7 +4,7 @@ reviewers:
- enisoc
- deads2k
content_template: templates/concept
weight: 20
weight: 10
---
{{% capture overview %}}
@ -37,7 +37,7 @@ On their own, custom resources simply let you store and retrieve structured data
When you combine a custom resource with a *custom controller*, custom resources
provide a true _declarative API_.
A [declarative API](/docs/concepts/overview/working-with-objects/kubernetes-objects/#understanding-kubernetes-objects)
A [declarative API](/docs/concepts/overview/kubernetes-api/)
allows you to _declare_ or specify the desired state of your resource and tries to
keep the current state of Kubernetes objects in sync with the desired state.
The controller interprets the structured data as a record of the user's
@ -128,7 +128,12 @@ Regardless of how they are installed, the new resources are referred to as Custo
## CustomResourceDefinitions
The [CustomResourceDefinition](/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/) API resource allows you to define custom resources. Defining a CRD object creates a new custom resource with a name and schema that you specify. The Kubernetes API serves and handles the storage of your custom resource.
The [CustomResourceDefinition](/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/)
API resource allows you to define custom resources.
Defining a CRD object creates a new custom resource with a name and schema that you specify.
The Kubernetes API serves and handles the storage of your custom resource.
The name of a CRD object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
This frees you from writing your own API server to handle the custom resource,
but the generic nature of the implementation means you have less flexibility than with
@ -162,7 +167,7 @@ CRDs are easier to create than Aggregated APIs.
| CRDs | Aggregated API |
| --------------------------- | -------------- |
| Do not require programming. Users can choose any language for a CRD controller. | Requires programming in Go and building binary and image. Users can choose any language for a CRD controller. |
| Do not require programming. Users can choose any language for a CRD controller. | Requires programming in Go and building binary and image. |
| No additional service to run; CRs are handled by API Server. | An additional service to create and that could fail. |
| No ongoing support once the CRD is created. Any bug fixes are picked up as part of normal Kubernetes Master upgrades. | May need to periodically pickup bug fixes from upstream and rebuild and update the Aggregated APIserver. |
| No need to handle multiple versions of your API. For example: when you control the client for this resource, you can upgrade it in sync with the API. | You need to handle multiple versions of your API, for example: when developing an extension to share with the world. |
@ -179,7 +184,7 @@ Aggregated APIs offer more advanced API features and customization of other feat
| Custom Storage | If you need storage with a different performance mode (for example, time-series database instead of key-value store) or isolation for security (for example, encryption secrets or different | No | Yes |
| Custom Business Logic | Perform arbitrary checks or actions when creating, reading, updating or deleting an object | Yes, using [Webhooks](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks). | Yes |
| Scale Subresource | Allows systems like HorizontalPodAutoscaler and PodDisruptionBudget interact with your new resource | [Yes](/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/#scale-subresource) | Yes |
| Status Subresource | <ul><li>Finer-grained access control: user writes spec section, controller writes status section.</li><li>Allows incrementing object Generation on custom resource data mutation (requires separate spec and status sections in the resource)</li></ul> | [Yes](/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/#status-subresource) | Yes |
| Status Subresource | Allows fine-grained access control where user writes the spec section and the controller writes the status section. Allows incrementing object Generation on custom resource data mutation (requires separate spec and status sections in the resource) | [Yes](/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/#status-subresource) | Yes |
| Other Subresources | Add operations other than CRUD, such as "logs" or "exec". | No | Yes |
| strategic-merge-patch | The new endpoints support PATCH with `Content-Type: application/strategic-merge-patch+json`. Useful for updating objects that may be modified both locally, and by the server. For more information, see ["Update API Objects in Place Using kubectl patch"](/docs/tasks/run-application/update-api-object-kubectl-patch/) | No | Yes |
| Protocol Buffers | The new resource supports clients that want to use Protocol Buffers | No | Yes |
@ -202,7 +207,7 @@ When you create a custom resource, either via a CRDs or an AA, you get many feat
| Finalizers | Block deletion of extension resources until external cleanup happens. |
| Admission Webhooks | Set default values and validate extension resources during any create/update/delete operation. |
| UI/CLI Display | Kubectl, dashboard can display extension resources. |
| Unset vs Empty | Clients can distinguish unset fields from zero-valued fields. |
| Unset versus Empty | Clients can distinguish unset fields from zero-valued fields. |
| Client Libraries Generation | Kubernetes provides generic client libraries, as well as tools to generate type-specific client libraries. |
| Labels and annotations | Common metadata across objects that tools know how to edit for core and custom resources. |

View File

@ -12,7 +12,7 @@ weight: 10
{{% capture overview %}}
{{< feature-state state="alpha" >}}
{{< warning >}}Alpha features change rapidly. {{< /warning >}}
{{< caution >}}Alpha features can change rapidly. {{< /caution >}}
Network plugins in Kubernetes come in a few flavors:
@ -154,7 +154,7 @@ most network plugins.
Where needed, you can specify the MTU explicitly with the `network-plugin-mtu` kubelet option. For example,
on AWS the `eth0` MTU is typically 9001, so you might specify `--network-plugin-mtu=9001`. If you're using IPSEC you
might reduce it to allow for encapsulation overhead e.g. `--network-plugin-mtu=8873`.
might reduce it to allow for encapsulation overhead; for example: `--network-plugin-mtu=8873`.
This option is provided to the network-plugin; currently **only kubenet supports `network-plugin-mtu`**.

View File

@ -1,117 +1,111 @@
---
title: Poseidon-Firmament - An alternate scheduler
title: Poseidon-Firmament Scheduler
content_template: templates/concept
weight: 80
---
{{% capture overview %}}
**Current release of Poseidon-Firmament scheduler is an <code> alpha </code> release.**
{{< feature-state for_k8s_version="v1.6" state="alpha" >}}
Poseidon-Firmament scheduler is an alternate scheduler that can be deployed alongside the default Kubernetes scheduler.
The Poseidon-Firmament scheduler is an alternate scheduler that can be deployed alongside the default Kubernetes scheduler.
{{% /capture %}}
{{% capture body %}}
## Introduction
## Introduction
Poseidon is a service that acts as the integration glue for the [Firmament scheduler](https://github.com/Huawei-PaaS/firmament) with Kubernetes. Poseidon-Firmament scheduler augments the current Kubernetes scheduling capabilities. It incorporates novel flow network graph based scheduling capabilities alongside the default Kubernetes Scheduler. Firmament scheduler models workloads and clusters as flow networks and runs min-cost flow optimizations over these networks to make scheduling decisions.
Poseidon is a service that acts as the integration glue between the [Firmament scheduler](https://github.com/Huawei-PaaS/firmament) and Kubernetes. Poseidon-Firmament augments the current Kubernetes scheduling capabilities. It incorporates novel flow network graph based scheduling capabilities alongside the default Kubernetes scheduler. The Firmament scheduler models workloads and clusters as flow networks and runs min-cost flow optimizations over these networks to make scheduling decisions.
It models the scheduling problem as a constraint-based optimization over a flow network graph. This is achieved by reducing scheduling to a min-cost max-flow optimization problem. The Poseidon-Firmament scheduler dynamically refines the workload placements.
Firmament models the scheduling problem as a constraint-based optimization over a flow network graph. This is achieved by reducing scheduling to a min-cost max-flow optimization problem. The Poseidon-Firmament scheduler dynamically refines the workload placements.
Poseidon-Firmament scheduler runs alongside the default Kubernetes Scheduler as an alternate scheduler, so multiple schedulers run simultaneously.
Poseidon-Firmament scheduler runs alongside the default Kubernetes scheduler as an alternate scheduler. You can simultaneously run multiple, different schedulers.
## Key Advantages
Flow graph scheduling with the Poseidon-Firmament scheduler provides the following advantages:
### Flow graph scheduling based Poseidon-Firmament scheduler provides the following key advantages:
- Workloads (pods) are bulk scheduled to enable scheduling at massive scale..
- Based on the extensive performance test results, Poseidon-Firmament scales much better than the Kubernetes default scheduler as the number of nodes increase in a cluster. This is due to the fact that Poseidon-Firmament is able to amortize more and more work across workloads.
- Poseidon-Firmament Scheduler outperforms the Kubernetes default scheduler by a wide margin when it comes to throughput performance numbers for scenarios where compute resource requirements are somewhat uniform across jobs (Replicasets/Deployments/Jobs). Poseidon-Firmament scheduler end-to-end throughput performance numbers, including bind time, consistently get better as the number of nodes in a cluster increase. For example, for a 2,700 node cluster (shown in the graphs [here](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/benchmark/README.md)), Poseidon-Firmament scheduler achieves a 7X or greater end-to-end throughput than the Kubernetes default scheduler, which includes bind time.
- Workloads (Pods) are bulk scheduled to enable scheduling at massive scale.
The Poseidon-Firmament scheduler outperforms the Kubernetes default scheduler by a wide margin when it comes to throughput performance for scenarios where compute resource requirements are somewhat uniform across your workload (Deployments, ReplicaSets, Jobs).
- The Poseidon-Firmament's scheduler's end-to-end throughput performance and bind time improves as the number of nodes in a cluster increases. As you scale out, Poseidon-Firmament scheduler is able to amortize more and more work across workloads.
- Scheduling in Poseidon-Firmament is dynamic; it keeps cluster resources in a global optimal state during every scheduling run.
- The Poseidon-Firmament scheduler supports scheduling complex rule constraints.
- Availability of complex rule constraints.
- Scheduling in Poseidon-Firmament is dynamic; it keeps cluster resources in a global optimal state during every scheduling run.
- Highly efficient resource utilizations.
## How the Poseidon-Firmament scheduler works
## Poseidon-Firmament Scheduler - How it works
Kubernetes supports [using multiple schedulers](/docs/tasks/administer-cluster/configure-multiple-schedulers/). You can specify, for a particular Pod, that it is scheduled by a custom scheduler (“poseidon” for this case), by setting the `schedulerName` field in the PodSpec at the time of pod creation. The default scheduler will ignore that Pod and allow Poseidon-Firmament scheduler to schedule the Pod on a relevant node.
As part of the Kubernetes multiple schedulers support, each new pod is typically scheduled by the default scheduler. Kubernetes can be instructed to use another scheduler by specifying the name of another custom scheduler (“poseidon” in our case) in the **schedulerName** field of the PodSpec at the time of pod creation. In this case, the default scheduler will ignore that Pod and allow Poseidon scheduler to schedule the Pod on a relevant node.
For example:
```yaml
apiVersion: v1
kind: Pod
...
spec:
schedulerName: poseidon
```
schedulerName: poseidon
...
```
{{< note >}}
For details about the design of this project see the [design document](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/design/README.md).
{{< /note >}}
## Possible Use Case Scenarios - When to use it
## Batch scheduling
As mentioned earlier, Poseidon-Firmament scheduler enables an extremely high throughput scheduling environment at scale due to its bulk scheduling approach versus Kubernetes pod-at-a-time approach. In our extensive tests, we have observed substantial throughput benefits as long as resource requirements (CPU/Memory) for incoming Pods are uniform across jobs (Replicasets/Deployments/Jobs), mainly due to efficient amortization of work across jobs.
Although, Poseidon-Firmament scheduler is capable of scheduling various types of workloads, such as service, batch, etc., the following are a few use cases where it excels the most:
1. For “Big Data/AI” jobs consisting of large number of tasks, throughput benefits are tremendous.
2. Service or batch jobs where workload resource requirements are uniform across jobs (Replicasets/Deployments/Jobs).
1. For “Big Data/AI” jobs consisting of large number of tasks, throughput benefits are tremendous.
2. Service or batch jobs where workload resource requirements are uniform across jobs (Replicasets/Deployments/Jobs).
## Current Project Stage
## Feature state
- **Alpha Release - Incubation repo.** at https://github.com/kubernetes-sigs/poseidon.
- Currently, Poseidon-Firmament scheduler **does not provide support for high availability**, our implementation assumes that the scheduler cannot fail. The [design document](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/design/README.md) describes possible ways to enable high availability, but we leave this to future work.
- We are **not aware of any production deployment** of Poseidon-Firmament scheduler at this time.
- Poseidon-Firmament is supported from Kubernetes release 1.6 and works with all subsequent releases.
- Release process for Poseidon and Firmament repos are in lock step. The current Poseidon release can be found [here](https://github.com/kubernetes-sigs/poseidon/releases) and the corresponding Firmament release can be found [here](https://github.com/Huawei-PaaS/firmament/releases).
Poseidon-Firmament is designed to work with Kubernetes release 1.6 and all subsequent releases.
## Features Comparison Matrix
{{< caution >}}
Poseidon-Firmament scheduler does not provide support for high availability; its implementation assumes that the scheduler cannot fail.
{{< /caution >}}
## Feature comparison {#feature-comparison-matrix}
{{< table caption="Feature comparison of Kubernetes and Poseidon-Firmament schedulers." >}}
|Feature|Kubernetes Default Scheduler|Poseidon-Firmament Scheduler|Notes|
|--- |--- |--- |--- |
|Node Affinity/Anti-Affinity|Y|Y||
|Pod Affinity/Anti-Affinity - including support for pod anti-affinity symmetry|Y|Y|Currently, the default scheduler outperforms the Poseidon-Firmament scheduler pod affinity/anti-affinity functionality. We are working towards resolving this.|
|Pod Affinity/Anti-Affinity - including support for pod anti-affinity symmetry|Y|Y|The default scheduler outperforms the Poseidon-Firmament scheduler pod affinity/anti-affinity functionality.|
|Taints & Tolerations|Y|Y||
|Baseline Scheduling capability in accordance to available compute resources (CPU & Memory) on a node|Y|Y**|Not all Predicates & Priorities are supported at this time.|
|Extreme Throughput at scale|Y**|Y|Bulk scheduling approach scales or increases workload placement. Substantial throughput benefits using Firmament scheduler as long as resource requirements (CPU/Memory) for incoming Pods is uniform across Replicasets/Deployments/Jobs. This is mainly due to efficient amortization of work across Replicasets/Deployments/Jobs . 1) For “Big Data/AI” jobs consisting of large no. of tasks, throughput benefits are tremendous. 2) Substantial throughput benefits also for service or batch job scenarios where workload resource requirements are uniform across Replicasets/Deployments/Jobs.|
|Optimal Scheduling|Pod-by-Pod scheduler, processes one pod at a time (may result into sub-optimal scheduling)|Bulk Scheduling (Optimal scheduling)|Pod-by-Pod Kubernetes default scheduler may assign tasks to a sub-optimal machine. By contrast, Firmament considers all unscheduled tasks at the same time together with their soft and hard constraints.|
|Colocation Interference Avoidance|N|N**|Planned in Poseidon-Firmament.|
|Priority Pre-emption|Y|N**|Partially exists in Poseidon-Firmament versus extensive support in Kubernetes default scheduler.|
|Inherent Re-Scheduling|N|Y**|Poseidon-Firmament scheduler supports workload re-scheduling. In each scheduling run it considers all the pods, including running pods, and as a result can migrate or evict pods a globally optimal scheduling environment.|
|Baseline Scheduling capability in accordance to available compute resources (CPU & Memory) on a node|Y|Y†|**†** Not all Predicates & Priorities are supported with Poseidon-Firmament.|
|Extreme Throughput at scale|Y†|Y|**†** Bulk scheduling approach scales or increases workload placement. Firmament scheduler offers high throughput when resource requirements (CPU/Memory) for incoming Pods are uniform across ReplicaSets/Deployments/Jobs.|
|Colocation Interference Avoidance|N|N||
|Priority Preemption|Y|N†|**†** Partially exists in Poseidon-Firmament versus extensive support in Kubernetes default scheduler.|
|Inherent Rescheduling|N|Y†|**†** Poseidon-Firmament scheduler supports workload re-scheduling. In each scheduling run, Poseidon-Firmament considers all Pods, including running Pods, and as a result can migrate or evict Pods a globally optimal scheduling environment.|
|Gang Scheduling|N|Y||
|Support for Pre-bound Persistence Volume Scheduling|Y|Y||
|Support for Local Volume & Dynamic Persistence Volume Binding Scheduling|Y|N**|Planned.|
|High Availability|Y|N**|Planned.|
|Real-time metrics based scheduling|N|Y**|Initially supported using Heapster (now deprecated) for placing pods using actual cluster utilization statistics rather than reservations. Plans to switch over to "metric server".|
|Support for Local Volume & Dynamic Persistence Volume Binding Scheduling|Y|N||
|High Availability|Y|N||
|Real-time metrics based scheduling|N|Y†|**†** Partially supported in Poseidon-Firmament using Heapster (now deprecated) for placing Pods using actual cluster utilization statistics rather than reservations.|
|Support for Max-Pod per node|Y|Y|Poseidon-Firmament scheduler seamlessly co-exists with Kubernetes default scheduler.|
|Support for Ephemeral Storage, in addition to CPU/Memory|Y|Y||
{{< /table >}}
## Installation
## Installation
The [Poseidon-Firmament installation guide](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/install/README.md#Installation) explains how to deploy Poseidon-Firmament to your cluster.
For in-cluster installation of Poseidon, please start at the [Installation instructions](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/install/README.md).
## Development
For developers, please refer to the [Developer Setup instructions](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/devel/README.md).
## Latest Throughput Performance Testing Results
Pod-by-pod schedulers, such as the Kubernetes default scheduler, typically process one pod at a time. These schedulers have the following crucial drawbacks:
1. The scheduler commits to a pod placement early and restricts the choices for other pods that wait to be placed.
2. There is limited opportunities for amortizing work across pods because they are considered for placement individually.
These downsides of pod-by-pod schedulers are addressed by batching or bulk scheduling in Poseidon-Firmament scheduler. Processing several pods in a batch allows the scheduler to jointly consider their placement, and thus to find the best trade-off for the whole batch instead of one pod. At the same time it amortizes work across pods resulting in much higher throughput.
## Performance comparison
{{< note >}}
Please refer to the [latest benchmark results](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/benchmark/README.md) for detailed throughput performance comparison test results between Poseidon-Firmament scheduler and the Kubernetes default scheduler.
{{< /note >}}
Pod-by-pod schedulers, such as the Kubernetes default scheduler, process Pods in small batches (typically one at a time). These schedulers have the following crucial drawbacks:
1. The scheduler commits to a pod placement early and restricts the choices for other pods that wait to be placed.
2. There is limited opportunities for amortizing work across pods because they are considered for placement individually.
These downsides of pod-by-pod schedulers are addressed by batching or bulk scheduling in Poseidon-Firmament scheduler. Processing several pods in a batch allows the scheduler to jointly consider their placement, and thus to find the best trade-off for the whole batch instead of one pod. At the same time it amortizes work across pods resulting in much higher throughput.
{{% /capture %}}
{{% capture whatsnext %}}
* See [Poseidon-Firmament](https://github.com/kubernetes-sigs/poseidon#readme) on GitHub for more information.
* See the [design document](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/design/README.md) for Poseidon.
* Read [Firmament: Fast, Centralized Cluster Scheduling at Scale](https://www.usenix.org/system/files/conference/osdi16/osdi16-gog.pdf), the academic paper on the Firmament scheduling design.
* If you'd like to contribute to Poseidon-Firmament, refer to the [developer setup instructions](https://github.com/kubernetes-sigs/poseidon/blob/master/docs/devel/README.md).
{{% /capture %}}

View File

@ -121,21 +121,22 @@ There are two supported paths to extending the API with [custom resources](/docs
to make it seamless for clients.
## Enabling API groups
## Enabling or disabling API groups
Certain resources and API groups are enabled by default. They can be enabled or disabled by setting `--runtime-config`
on apiserver. `--runtime-config` accepts comma separated values. For ex: to disable batch/v1, set
on apiserver. `--runtime-config` accepts comma separated values. For example: to disable batch/v1, set
`--runtime-config=batch/v1=false`, to enable batch/v2alpha1, set `--runtime-config=batch/v2alpha1`.
The flag accepts comma separated set of key=value pairs describing runtime configuration of the apiserver.
IMPORTANT: Enabling or disabling groups or resources requires restarting apiserver and controller-manager
to pick up the `--runtime-config` changes.
{{< note >}}Enabling or disabling groups or resources requires restarting apiserver and controller-manager
to pick up the `--runtime-config` changes.{{< /note >}}
## Enabling resources in the groups
## Enabling specific resources in the extensions/v1beta1 group
DaemonSets, Deployments, HorizontalPodAutoscalers, Ingresses, Jobs and ReplicaSets are enabled by default.
Other extensions resources can be enabled by setting `--runtime-config` on
apiserver. `--runtime-config` accepts comma separated values. For example: to disable deployments and ingress, set
`--runtime-config=extensions/v1beta1/deployments=false,extensions/v1beta1/ingresses=false`
DaemonSets, Deployments, StatefulSet, NetworkPolicies, PodSecurityPolicies and ReplicaSets in the `extensions/v1beta1` API group are disabled by default.
For example: to enable deployments and daemonsets, set
`--runtime-config=extensions/v1beta1/deployments=true,extensions/v1beta1/daemonsets=true`.
{{< note >}}Individual resource enablement/disablement is only supported in the `extensions/v1beta1` API group for legacy reasons.{{< /note >}}
{{% /capture %}}

View File

@ -2,7 +2,9 @@
reviewers:
- bgrant0607
- mikedanese
title: What is Kubernetes
title: What is Kubernetes?
description: >
Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available.
content_template: templates/concept
weight: 10
card:
@ -17,9 +19,10 @@ This page is an overview of Kubernetes.
{{% capture body %}}
Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available.
The name Kubernetes originates from Greek, meaning helmsman or pilot. Google open-sourced the Kubernetes project in 2014. Kubernetes builds upon a [decade and a half of experience that Google has with running production workloads at scale](https://ai.google/research/pubs/pub43438), combined with best-of-breed ideas and practices from the community.
The name Kubernetes originates from Greek, meaning helmsman or pilot. Google open-sourced the Kubernetes project in 2014. Kubernetes combines [over 15 years of Google's experience](/blog/2015/04/borg-predecessor-to-kubernetes/) running production workloads at scale with best-of-breed ideas and practices from the community.
## Going back in time
Let's take a look at why Kubernetes is so useful by going back in time.
![Deployment evolution](/images/docs/Container_Evolution.svg)
@ -42,13 +45,13 @@ Containers have become popular because they provide extra benefits, such as:
* Dev and Ops separation of concerns: create application container images at build/release time rather than deployment time, thereby decoupling applications from infrastructure.
* Observability not only surfaces OS-level information and metrics, but also application health and other signals.
* Environmental consistency across development, testing, and production: Runs the same on a laptop as it does in the cloud.
* Cloud and OS distribution portability: Runs on Ubuntu, RHEL, CoreOS, on-prem, Google Kubernetes Engine, and anywhere else.
* Cloud and OS distribution portability: Runs on Ubuntu, RHEL, CoreOS, on-premises, on major public clouds, and anywhere else.
* Application-centric management: Raises the level of abstraction from running an OS on virtual hardware to running an application on an OS using logical resources.
* Loosely coupled, distributed, elastic, liberated micro-services: applications are broken into smaller, independent pieces and can be deployed and managed dynamically not a monolithic stack running on one big single-purpose machine.
* Resource isolation: predictable application performance.
* Resource utilization: high efficiency and density.
## Why you need Kubernetes and what can it do
## Why you need Kubernetes and what it can do {#why-you-need-kubernetes-and-what-can-it-do}
Containers are a good way to bundle and run your applications. In a production environment, you need to manage the containers that run the applications and ensure that there is no downtime. For example, if a container goes down, another container needs to start. Wouldn't it be easier if this behavior was handled by a system?

View File

@ -82,7 +82,7 @@ metadata:
spec:
containers:
- name: nginx
image: nginx:1.7.9
image: nginx:1.14.2
ports:
- containerPort: 80

View File

@ -12,9 +12,9 @@ This page explains how Kubernetes objects are represented in the Kubernetes API,
{{% /capture %}}
{{% capture body %}}
## Understanding Kubernetes Objects
## Understanding Kubernetes objects {#kubernetes-objects}
*Kubernetes Objects* are persistent entities in the Kubernetes system. Kubernetes uses these entities to represent the state of your cluster. Specifically, they can describe:
*Kubernetes objects* are persistent entities in the Kubernetes system. Kubernetes uses these entities to represent the state of your cluster. Specifically, they can describe:
* What containerized applications are running (and on which nodes)
* The resources available to those applications
@ -26,14 +26,31 @@ To work with Kubernetes objects--whether to create, modify, or delete them--you'
### Object Spec and Status
Every Kubernetes object includes two nested object fields that govern the object's configuration: the object *spec* and the object *status*. The *spec*, which you must provide, describes your desired state for the object--the characteristics that you want the object to have. The *status* describes the *actual state* of the object, and is supplied and updated by the Kubernetes system. At any given time, the Kubernetes Control Plane actively manages an object's actual state to match the desired state you supplied.
Almost every Kubernetes object includes two nested object fields that govern
the object's configuration: the object *`spec`* and the object *`status`*.
For objects that have a `spec`, you have to set this when you create the object,
providing a description of the characteristics you want the resource to have:
its _desired state_.
The `status` describes the _current state_ of the object, supplied and updated
by the Kubernetes and its components. The Kubernetes
{{< glossary_tooltip text="control plane" term_id="control-plane" >}} continually
and actively manages every object's actual state to match the desired state you
supplied.
For example, a Kubernetes Deployment is an object that can represent an application running on your cluster. When you create the Deployment, you might set the Deployment spec to specify that you want three replicas of the application to be running. The Kubernetes system reads the Deployment spec and starts three instances of your desired application--updating the status to match your spec. If any of those instances should fail (a status change), the Kubernetes system responds to the difference between spec and status by making a correction--in this case, starting a replacement instance.
For example: in Kubernetes, a Deployment is an object that can represent an
application running on your cluster. When you create the Deployment, you
might set the Deployment `spec` to specify that you want three replicas of
the application to be running. The Kubernetes system reads the Deployment
spec and starts three instances of your desired application--updating
the status to match your spec. If any of those instances should fail
(a status change), the Kubernetes system responds to the difference
between spec and status by making a correction--in this case, starting
a replacement instance.
For more information on the object spec, status, and metadata, see the [Kubernetes API Conventions](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md).
### Describing a Kubernetes Object
### Describing a Kubernetes object
When you create an object in Kubernetes, you must provide the object spec that describes its desired state, as well as some basic information about the object (such as a name). When you use the Kubernetes API to create the object (either directly or via `kubectl`), that API request must include that information as JSON in the request body. **Most often, you provide the information to `kubectl` in a .yaml file.** `kubectl` converts the information to JSON when making the API request.
@ -51,7 +68,7 @@ kubectl apply -f https://k8s.io/examples/application/deployment.yaml --record
The output is similar to this:
```shell
```
deployment.apps/nginx-deployment created
```
@ -65,14 +82,15 @@ In the `.yaml` file for the Kubernetes object you want to create, you'll need to
* `spec` - What state you desire for the object
The precise format of the object `spec` is different for every Kubernetes object, and contains nested fields specific to that object. The [Kubernetes API Reference](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/) can help you find the spec format for all of the objects you can create using Kubernetes.
For example, the `spec` format for a `Pod` can be found
[here](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podspec-v1-core),
and the `spec` format for a `Deployment` can be found
[here](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#deploymentspec-v1-apps).
For example, the `spec` format for a Pod can be found in
[PodSpec v1 core](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podspec-v1-core),
and the `spec` format for a Deployment can be found in
[DeploymentSpec v1 apps](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#deploymentspec-v1-apps).
{{% /capture %}}
{{% capture whatsnext %}}
* [Kubernetes API overview](/docs/reference/using-api/api-overview/) explains some more API concepts
* Learn about the most important basic Kubernetes objects, such as [Pod](/docs/concepts/workloads/pods/pod-overview/).
* Learn about [controllers](/docs/concepts/architecture/controller/) in Kubernetes
{{% /capture %}}

View File

@ -69,10 +69,10 @@ metadata:
spec:
containers:
- name: nginx
image: nginx:1.7.9
image: nginx:1.14.2
ports:
- containerPort: 80
```
## Label selectors
@ -92,7 +92,7 @@ them.
For some API types, such as ReplicaSets, the label selectors of two instances must not overlap within a namespace, or the controller can see that as conflicting instructions and fail to determine how many replicas should be present.
{{< /note >}}
{{< caution >}}
{{< caution >}}
For both equality-based and set-based conditions there is no logical _OR_ (`||`) operator. Ensure your filter statements are structured accordingly.
{{< /caution >}}
@ -210,7 +210,7 @@ this selector (respectively in `json` or `yaml` format) is equivalent to `compon
#### Resources that support set-based requirements
Newer resources, such as [`Job`](/docs/concepts/jobs/run-to-completion-finite-workloads/), [`Deployment`](/docs/concepts/workloads/controllers/deployment/), [`Replica Set`](/docs/concepts/workloads/controllers/replicaset/), and [`Daemon Set`](/docs/concepts/workloads/controllers/daemonset/), support _set-based_ requirements as well.
Newer resources, such as [`Job`](/docs/concepts/workloads/controllers/jobs-run-to-completion/), [`Deployment`](/docs/concepts/workloads/controllers/deployment/), [`ReplicaSet`](/docs/concepts/workloads/controllers/replicaset/), and [`DaemonSet`](/docs/concepts/workloads/controllers/daemonset/), support _set-based_ requirements as well.
```yaml
selector:

View File

@ -2,7 +2,7 @@
reviewers:
- mikedanese
- thockin
title: Names
title: Object Names and IDs
content_template: templates/concept
weight: 20
---
@ -18,14 +18,41 @@ For non-unique user-provided attributes, Kubernetes provides [labels](/docs/conc
{{% /capture %}}
{{% capture body %}}
## Names
{{< glossary_definition term_id="name" length="all" >}}
Kubernetes resources can have names up to 253 characters long. The characters allowed in names are: digits (0-9), lower case letters (a-z), `-`, and `.`.
Below are three types of commonly used name constraints for resources.
### DNS Subdomain Names
Most resource types require a name that can be used as a DNS subdomain name
as defined in [RFC 1123](https://tools.ietf.org/html/rfc1123).
This means the name must:
- contain no more than 253 characters
- contain only lowercase alphanumeric characters, '-' or '.'
- start with an alphanumeric character
- end with an alphanumeric character
### DNS Label Names
Some resource types require their names to follow the DNS
label standard as defined in [RFC 1123](https://tools.ietf.org/html/rfc1123).
This means the name must:
- contain at most 63 characters
- contain only lowercase alphanumeric characters or '-'
- start with an alphanumeric character
- end with an alphanumeric character
### Path Segment Names
Some resource types require their names to be able to be safely encoded as a
path segment. In other words, the name may not be "." or ".." and the name may
not contain "/" or "%".
Heres an example manifest for a Pod named `nginx-demo`.
@ -37,11 +64,12 @@ metadata:
spec:
containers:
- name: nginx
image: nginx:1.7.9
image: nginx:1.14.2
ports:
- containerPort: 80
```
{{< note >}}
Some resource types have additional restrictions on their names.
{{< /note >}}

View File

@ -9,56 +9,58 @@ weight: 10
{{% capture overview %}}
By default, containers run with unbounded [compute resources](/docs/user-guide/compute-resources) on a Kubernetes cluster.
With Resource quotas, cluster administrators can restrict the resource consumption and creation on a namespace basis.
Within a namespace, a Pod or Container can consume as much CPU and memory as defined by the namespace's resource quota. There is a concern that one Pod or Container could monopolize all of the resources. Limit Range is a policy to constrain resource by Pod or Container in a namespace.
With resource quotas, cluster administrators can restrict resource consumption and creation on a namespace basis.
Within a namespace, a Pod or Container can consume as much CPU and memory as defined by the namespace's resource quota. There is a concern that one Pod or Container could monopolize all available resources. A LimitRange is a policy to constrain resource allocations (to Pods or Containers) in a namespace.
{{% /capture %}}
{{% capture body %}}
A limit range, defined by a `LimitRange` object, provides constraints that can:
A _LimitRange_ provides constraints that can:
- Enforce minimum and maximum compute resources usage per Pod or Container in a namespace.
- Enforce minimum and maximum storage request per PersistentVolumeClaim in a namespace.
- Enforce a ratio between request and limit for a resource in a namespace.
- Set default request/limit for compute resources in a namespace and automatically inject them to Containers at runtime.
## Enabling Limit Range
## Enabling LimitRange
Limit Range support is enabled by default for many Kubernetes distributions. It is
LimitRange support is enabled by default for many Kubernetes distributions. It is
enabled when the apiserver `--enable-admission-plugins=` flag has `LimitRanger` admission controller as
one of its arguments.
A limit range is enforced in a particular namespace when there is a
`LimitRange` object in that namespace.
A LimitRange is enforced in a particular namespace when there is a
LimitRange object in that namespace.
### Overview of Limit Range:
The name of a LimitRange object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
### Overview of Limit Range
- The administrator creates one `LimitRange` in one namespace.
- Users create resources like Pods, Containers, and PersistentVolumeClaims in the namespace.
- The `LimitRanger` admission controller enforces defaults limits for all Pods and Container that do not set compute resource requirements and tracks usage to ensure it does not exceed resource minimum , maximum and ratio defined in any `LimitRange` present in the namespace.
- If creating or updating a resource (Pod, Container, PersistentVolumeClaim) violates a limit range constraint, the request to the API server will fail with HTTP status code `403 FORBIDDEN` and a message explaining the constraint that would have been violated.
- If limit range is activated in a namespace for compute resources like `cpu` and `memory`, users must specify
requests or limits for those values; otherwise, the system may reject pod creation.
- LimitRange validations occurs only at Pod Admission stage, not on Running pods.
- The `LimitRanger` admission controller enforces defaults and limits for all Pods and Containers that do not set compute resource requirements and tracks usage to ensure it does not exceed resource minimum, maximum and ratio defined in any LimitRange present in the namespace.
- If creating or updating a resource (Pod, Container, PersistentVolumeClaim) that violates a LimitRange constraint, the request to the API server will fail with an HTTP status code `403 FORBIDDEN` and a message explaining the constraint that have been violated.
- If a LimitRange is activated in a namespace for compute resources like `cpu` and `memory`, users must specify
requests or limits for those values. Otherwise, the system may reject Pod creation.
- LimitRange validations occurs only at Pod Admission stage, not on Running Pods.
Examples of policies that could be created using limit range are:
- In a 2 node cluster with a capacity of 8 GiB RAM, and 16 cores, constrain Pods in a namespace to request 100m and not exceeds 500m for CPU , request 200Mi and not exceed 600Mi
- Define default CPU limits and request to 150m and Memory default request to 300Mi for containers started with no cpu and memory requests in their spec.
- In a 2 node cluster with a capacity of 8 GiB RAM and 16 cores, constrain Pods in a namespace to request 100m of CPU with a max limit of 500m for CPU and request 200Mi for Memory with a max limit of 600Mi for Memory.
- Define default CPU limit and request to 150m and memory default request to 300Mi for Containers started with no cpu and memory requests in their specs.
In the case where the total limits of the namespace is less than the sum of the limits of the Pods/Containers,
there may be contention for resources; The Containers or Pods will not be created.
there may be contention for resources. In this case, the Containers or Pods will not be created.
Neither contention nor changes to limitrange will affect already created resources.
Neither contention nor changes to a LimitRange will affect already created resources.
## Limiting Container compute resources
The following section discusses the creation of a LimitRange acting at Container Level.
A Pod with 04 containers is first created; each container within the Pod has a specific `spec.resource` configuration
each container within the pod is handled differently by the LimitRanger admission controller.
A Pod with 04 Containers is first created. Each Container within the Pod has a specific `spec.resource` configuration.
Each Container within the Pod is handled differently by the `LimitRanger` admission controller.
Create a namespace `limitrange-demo` using the following kubectl command:
@ -75,16 +77,16 @@ kubectl config set-context --current --namespace=limitrange-demo
Here is the configuration file for a LimitRange object:
{{< codenew file="admin/resource/limit-mem-cpu-container.yaml" >}}
This object defines minimum and maximum Memory/CPU limits, default cpu/Memory requests and default limits for CPU/Memory resources to be apply to containers.
This object defines minimum and maximum CPU/Memory limits, default CPU/Memory requests, and default limits for CPU/Memory resources to be apply to containers.
Create the `limit-mem-cpu-per-container` LimitRange in the `limitrange-demo` namespace with the following kubectl command:
Create the `limit-mem-cpu-per-container` LimitRange with the following kubectl command:
```shell
kubectl create -f https://k8s.io/examples/admin/resource/limit-mem-cpu-container.yaml -n limitrange-demo
kubectl create -f https://k8s.io/examples/admin/resource/limit-mem-cpu-container.yaml
```
```shell
kubectl describe limitrange/limit-mem-cpu-per-container -n limitrange-demo
kubectl describe limitrange/limit-mem-cpu-per-container
```
```shell
@ -94,13 +96,13 @@ Container cpu 100m 800m 110m 700m -
Container memory 99Mi 1Gi 111Mi 900Mi -
```
Here is the configuration file for a Pod with 04 containers to demonstrate LimitRange features :
Here is the configuration file for a Pod with 04 Containers to demonstrate LimitRange features:
{{< codenew file="admin/resource/limit-range-pod-1.yaml" >}}
Create the `busybox1` Pod:
```shell
kubectl apply -f https://k8s.io/examples/admin/resource/limit-range-pod-1.yaml -n limitrange-demo
kubectl apply -f https://k8s.io/examples/admin/resource/limit-range-pod-1.yaml
```
### Container spec with valid CPU/Memory requests and limits
@ -108,7 +110,7 @@ kubectl apply -f https://k8s.io/examples/admin/resource/limit-range-pod-1.yaml -
View the `busybox-cnt01` resource configuration:
```shell
kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[0].resources"
kubectl get po/busybox1 -o json | jq ".spec.containers[0].resources"
```
```json
@ -125,9 +127,9 @@ kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[0].res
```
- The `busybox-cnt01` Container inside `busybox` Pod defined `requests.cpu=100m` and `requests.memory=100Mi`.
- `100m <= 500m <= 800m` , The container cpu limit (500m) falls inside the authorized CPU limit range.
- `99Mi <= 200Mi <= 1Gi` , The container memory limit (200Mi) falls inside the authorized Memory limit range.
- No request/limits ratio validation for CPU/Memory , thus the container is valid and created.
- `100m <= 500m <= 800m` , The Container cpu limit (500m) falls inside the authorized CPU LimitRange.
- `99Mi <= 200Mi <= 1Gi` , The Container memory limit (200Mi) falls inside the authorized Memory LimitRange.
- No request/limits ratio validation for CPU/Memory, so the Container is valid and created.
### Container spec with a valid CPU/Memory requests but no limits
@ -135,7 +137,7 @@ kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[0].res
View the `busybox-cnt02` resource configuration
```shell
kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[1].resources"
kubectl get po/busybox1 -o json | jq ".spec.containers[1].resources"
```
```json
@ -151,17 +153,18 @@ kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[1].res
}
```
- The `busybox-cnt02` Container inside `busybox1` Pod defined `requests.cpu=100m` and `requests.memory=100Mi` but not limits for cpu and memory.
- The container do not have a limits section, the default limits defined in the limit-mem-cpu-per-container LimitRange object are injected to this container `limits.cpu=700mi` and `limits.memory=900Mi`.
- `100m <= 700m <= 800m` , The container cpu limit (700m) falls inside the authorized CPU limit range.
- `99Mi <= 900Mi <= 1Gi` , The container memory limit (900Mi) falls inside the authorized Memory limit range.
- No request/limits ratio set , thus the container is valid and created.
- The Container does not have a limits section. The default limits defined in the `limit-mem-cpu-per-container` LimitRange object are injected in to this Container: `limits.cpu=700mi` and `limits.memory=900Mi`.
- `100m <= 700m <= 800m` , The Container cpu limit (700m) falls inside the authorized CPU limit range.
- `99Mi <= 900Mi <= 1Gi` , The Container memory limit (900Mi) falls inside the authorized Memory limit range.
- No request/limits ratio set, so the Container is valid and created.
### Container spec with a valid CPU/Memory limits but no requests
View the `busybox-cnt03` resource configuration
### Container spec with a valid CPU/Memory limits but no requests
View the `busybox-cnt03` resource configuration:
```shell
kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[2].resources"
kubectl get po/busybox1 -o json | jq ".spec.containers[2].resources"
```
```json
{
@ -177,17 +180,17 @@ kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[2].res
```
- The `busybox-cnt03` Container inside `busybox1` Pod defined `limits.cpu=500m` and `limits.memory=200Mi` but no `requests` for cpu and memory.
- The container do not define a request section, the defaultRequest defined in the limit-mem-cpu-per-container LimitRange is not used to fill its limits section but the limits defined by the container are set as requests `limits.cpu=500m` and `limits.memory=200Mi`.
- `100m <= 500m <= 800m` , The container cpu limit (500m) falls inside the authorized CPU limit range.
- `99Mi <= 200Mi <= 1Gi` , The container memory limit (200Mi) falls inside the authorized Memory limit range.
- No request/limits ratio set , thus the container is valid and created.
- The Container does not define a request section. The default request defined in the limit-mem-cpu-per-container LimitRange is not used to fill its limits section, but the limits defined by the Container are set as requests `limits.cpu=500m` and `limits.memory=200Mi`.
- `100m <= 500m <= 800m` , The Container cpu limit (500m) falls inside the authorized CPU limit range.
- `99Mi <= 200Mi <= 1Gi` , The Container memory limit (200Mi) falls inside the authorized Memory limit range.
- No request/limits ratio set, so the Container is valid and created.
### Container spec with no CPU/Memory requests/limits
View the `busybox-cnt04` resource configuration:
```shell
kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[3].resources"
kubectl get po/busybox1 -o json | jq ".spec.containers[3].resources"
```
```json
@ -204,27 +207,27 @@ kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[3].res
```
- The `busybox-cnt04` Container inside `busybox1` define neither `limits` nor `requests`.
- The container do not define a limit section, the default limit defined in the limit-mem-cpu-per-container LimitRange is used to fill its request
- The Container do not define a limit section, the default limit defined in the limit-mem-cpu-per-container LimitRange is used to fill its request
`limits.cpu=700m and` `limits.memory=900Mi` .
- The container do not define a request section, the defaultRequest defined in the limit-mem-cpu-per-container LimitRange is used to fill its request section requests.cpu=110m and requests.memory=111Mi
- `100m <= 700m <= 800m` , The container cpu limit (700m) falls inside the authorized CPU limit range.
- `99Mi <= 900Mi <= 1Gi` , The container memory limit (900Mi) falls inside the authorized Memory limitrange .
- No request/limits ratio set , thus the container is valid and created.
- The Container do not define a request section, the defaultRequest defined in the `limit-mem-cpu-per-container` LimitRange is used to fill its request section requests.cpu=110m and requests.memory=111Mi
- `100m <= 700m <= 800m` , The Container cpu limit (700m) falls inside the authorized CPU limit range.
- `99Mi <= 900Mi <= 1Gi` , The Container memory limit (900Mi) falls inside the authorized Memory limit range .
- No request/limits ratio set, so the Container is valid and created.
All containers defined in the `busybox` Pod passed LimitRange validations, this the Pod is valid and create in the namespace.
All Containers defined in the `busybox` Pod passed LimitRange validations, so this the Pod is valid and created in the namespace.
## Limiting Pod compute resources
The following section discusses how to constrain resources at Pod level.
The following section discusses how to constrain resources at the Pod level.
{{< codenew file="admin/resource/limit-mem-cpu-pod.yaml" >}}
Without having to delete `busybox1` Pod, create the `limit-mem-cpu-pod` LimitRange in the `limitrange-demo` namespace:
Without having to delete the `busybox1` Pod, create the `limit-mem-cpu-pod` LimitRange in the `limitrange-demo` namespace:
```shell
kubectl apply -f https://k8s.io/examples/admin/resource/limit-mem-cpu-pod.yaml -n limitrange-demo
kubectl apply -f https://k8s.io/examples/admin/resource/limit-mem-cpu-pod.yaml
```
The limitrange is created and limits CPU to 2 Core and Memory to 2Gi per Pod:
The LimitRange is created and limits CPU to 2 Core and Memory to 2Gi per Pod:
```shell
limitrange/limit-mem-cpu-per-pod created
@ -250,36 +253,36 @@ Now create the `busybox2` Pod:
{{< codenew file="admin/resource/limit-range-pod-2.yaml" >}}
```shell
kubectl apply -f https://k8s.io/examples/admin/resource/limit-range-pod-2.yaml -n limitrange-demo
kubectl apply -f https://k8s.io/examples/admin/resource/limit-range-pod-2.yaml
```
The `busybox2` Pod definition is identical to `busybox1` but an error is reported since Pod's resources are now limited:
The `busybox2` Pod definition is identical to `busybox1`, but an error is reported since the Pod's resources are now limited:
```shell
Error from server (Forbidden): error when creating "limit-range-pod-2.yaml": pods "busybox2" is forbidden: [maximum cpu usage per Pod is 2, but limit is 2400m., maximum memory usage per Pod is 2Gi, but limit is 2306867200.]
```
```shell
kubectl get po/busybox1 -n limitrange-demo -o json | jq ".spec.containers[].resources.limits.memory"
kubectl get po/busybox1 -o json | jq ".spec.containers[].resources.limits.memory"
"200Mi"
"900Mi"
"200Mi"
"900Mi"
```
`busybox2` Pod will not be admitted on the cluster since the total memory limit of its container is greater than the limit defined in the LimitRange.
`busybox2` Pod will not be admitted on the cluster since the total memory limit of its Container is greater than the limit defined in the LimitRange.
`busybox1` will not be evicted since it was created and admitted on the cluster before the LimitRange creation.
## Limiting Storage resources
You can enforce minimum and maximum size of [storage resources](/docs/concepts/storage/persistent-volumes/) that can be requested by each PersistentVolumeClaim in a namespace using a LimitRange:
You can enforce minimum and maximum size of [storage resources](/docs/concepts/storage/persistent-volumes/) that can be requested by each PersistentVolumeClaim in a namespace using a LimitRange:
{{< codenew file="admin/resource/storagelimits.yaml" >}}
Apply the YAML using `kubectl create`:
```shell
kubectl create -f https://k8s.io/examples/admin/resource/storagelimits.yaml -n limitrange-demo
kubectl create -f https://k8s.io/examples/admin/resource/storagelimits.yaml
```
```shell
@ -305,7 +308,7 @@ PersistentVolumeClaim storage 1Gi 2Gi - - -
{{< codenew file="admin/resource/pvc-limit-lower.yaml" >}}
```shell
kubectl create -f https://k8s.io/examples/admin/resource/pvc-limit-lower.yaml -n limitrange-demo
kubectl create -f https://k8s.io/examples/admin/resource/pvc-limit-lower.yaml
```
While creating a PVC with `requests.storage` lower than the Min value in the LimitRange, an Error thrown by the server:
@ -319,7 +322,7 @@ Same behaviour is noted if the `requests.storage` is greater than the Max value
{{< codenew file="admin/resource/pvc-limit-greater.yaml" >}}
```shell
kubectl create -f https://k8s.io/examples/admin/resource/pvc-limit-greater.yaml -n limitrange-demo
kubectl create -f https://k8s.io/examples/admin/resource/pvc-limit-greater.yaml
```
```shell
@ -328,9 +331,9 @@ Error from server (Forbidden): error when creating "pvc-limit-greater.yaml": per
## Limits/Requests Ratio
If `LimitRangeItem.maxLimitRequestRatio` is specified in the `LimitRangeSpec`, the named resource must have a request and limit that are both non-zero where limit divided by request is less than or equal to the enumerated value
If `LimitRangeItem.maxLimitRequestRatio` is specified in the `LimitRangeSpec`, the named resource must have a request and limit that are both non-zero where limit divided by request is less than or equal to the enumerated value.
The following `LimitRange` enforces memory limit to be at most twice the amount of the memory request for any pod in the namespace.
The following LimitRange enforces memory limit to be at most twice the amount of the memory request for any Pod in the namespace:
{{< codenew file="admin/resource/limit-memory-ratio-pod.yaml" >}}
@ -352,7 +355,7 @@ Type Resource Min Max Default Request Default Limit Max Limit/Reques
Pod memory - - - - 2
```
Let's create a pod with `requests.memory=100Mi` and `limits.memory=300Mi`:
Create a pod with `requests.memory=100Mi` and `limits.memory=300Mi`:
{{< codenew file="admin/resource/limit-range-pod-3.yaml" >}}
@ -360,19 +363,24 @@ Let's create a pod with `requests.memory=100Mi` and `limits.memory=300Mi`:
kubectl apply -f https://k8s.io/examples/admin/resource/limit-range-pod-3.yaml
```
The pod creation failed as the ratio here (`3`) is greater than the enforced limit (`2`) in `limit-memory-ratio-pod` LimitRange
The pod creation failed as the ratio here (`3`) is greater than the enforced limit (`2`) in `limit-memory-ratio-pod` LimitRange:
```shell
```
Error from server (Forbidden): error when creating "limit-range-pod-3.yaml": pods "busybox3" is forbidden: memory max limit to request ratio per Pod is 2, but provided ratio is 3.000000.
```
### Clean up
## Clean up
Delete the `limitrange-demo` namespace to free all resources:
```shell
kubectl delete ns limitrange-demo
```
Change your context to `default` namespace with the following command:
```shell
kubectl config set-context --current --namespace=default
```
## Examples

View File

@ -197,6 +197,8 @@ alias kubectl-user='kubectl --as=system:serviceaccount:psp-example:fake-user -n
Define the example PodSecurityPolicy object in a file. This is a policy that
simply prevents the creation of privileged pods.
The name of a PodSecurityPolicy object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
{{< codenew file="policy/example-psp.yaml" >}}
@ -419,8 +421,10 @@ The **recommended minimum set** of allowed volumes for new PSPs are:
- projected
{{< warning >}}
PodSecurityPolicy does not limit the types of `PersistentVolume` objects that may be referenced by a `PersistentVolumeClaim`.
Only trusted users should be granted permission to create `PersistentVolume` objects.
PodSecurityPolicy does not limit the types of `PersistentVolume` objects that
may be referenced by a `PersistentVolumeClaim`, and hostPath type
`PersistentVolumes` do not support read-only access mode. Only trusted users
should be granted permission to create `PersistentVolume` objects.
{{< /warning >}}
**FSGroup** - Controls the supplemental group applied to some volumes.

View File

@ -37,6 +37,9 @@ Resource quotas work like this:
the `LimitRanger` admission controller to force defaults for pods that make no compute resource requirements.
See the [walkthrough](/docs/tasks/administer-cluster/quota-memory-cpu-namespace/) for an example of how to avoid this problem.
The name of a `ResourceQuota` object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
Examples of policies that could be created using namespaces and quotas are:
- In a cluster with a capacity of 32 GiB RAM, and 16 cores, let team A use 20 GiB and 10 cores,
@ -376,7 +379,7 @@ pods 0 10
* `Exist`
* `DoesNotExist`
## Requests vs Limits
## Requests compared to Limits {#requests-vs-limits}
When allocating compute resources, each container may specify a request and a limit value for either CPU or memory.
The quota can be configured to quota either value.

View File

@ -1,7 +1,7 @@
---
title: Kubernetes Scheduler
content_template: templates/concept
weight: 60
weight: 50
---
{{% capture overview %}}
@ -54,14 +54,12 @@ individual and collective resource requirements, hardware / software /
policy constraints, affinity and anti-affinity specifications, data
locality, inter-workload interference, and so on.
## Scheduling with kube-scheduler {#kube-scheduler-implementation}
### Node selection in kube-scheduler {#kube-scheduler-implementation}
kube-scheduler selects a node for the pod in a 2-step operation:
1. Filtering
2. Scoring
1. Scoring
The _filtering_ step finds the set of Nodes where it's feasible to
schedule the Pod. For example, the PodFitsResources filter checks whether a
@ -78,105 +76,15 @@ Finally, kube-scheduler assigns the Pod to the Node with the highest ranking.
If there is more than one node with equal scores, kube-scheduler selects
one of these at random.
There are two supported ways to configure the filtering and scoring behavior
of the scheduler:
### Default policies
kube-scheduler has a default set of scheduling policies.
### Filtering
- `PodFitsHostPorts`: Checks if a Node has free ports (the network protocol kind)
for the Pod ports the Pod is requesting.
- `PodFitsHost`: Checks if a Pod specifies a specific Node by its hostname.
- `PodFitsResources`: Checks if the Node has free resources (eg, CPU and Memory)
to meet the requirement of the Pod.
- `PodMatchNodeSelector`: Checks if a Pod's Node {{< glossary_tooltip term_id="selector" >}}
matches the Node's {{< glossary_tooltip text="label(s)" term_id="label" >}}.
- `NoVolumeZoneConflict`: Evaluate if the {{< glossary_tooltip text="Volumes" term_id="volume" >}}
that a Pod requests are available on the Node, given the failure zone restrictions for
that storage.
- `NoDiskConflict`: Evaluates if a Pod can fit on a Node due to the volumes it requests,
and those that are already mounted.
- `MaxCSIVolumeCount`: Decides how many {{< glossary_tooltip text="CSI" term_id="csi" >}}
volumes should be attached, and whether that's over a configured limit.
- `CheckNodeMemoryPressure`: If a Node is reporting memory pressure, and there's no
configured exception, the Pod won't be scheduled there.
- `CheckNodePIDPressure`: If a Node is reporting that process IDs are scarce, and
there's no configured exception, the Pod won't be scheduled there.
- `CheckNodeDiskPressure`: If a Node is reporting storage pressure (a filesystem that
is full or nearly full), and there's no configured exception, the Pod won't be
scheduled there.
- `CheckNodeCondition`: Nodes can report that they have a completely full filesystem,
that networking isn't available or that kubelet is otherwise not ready to run Pods.
If such a condition is set for a Node, and there's no configured exception, the Pod
won't be scheduled there.
- `PodToleratesNodeTaints`: checks if a Pod's {{< glossary_tooltip text="tolerations" term_id="toleration" >}}
can tolerate the Node's {{< glossary_tooltip text="taints" term_id="taint" >}}.
- `CheckVolumeBinding`: Evaluates if a Pod can fit due to the volumes it requests.
This applies for both bound and unbound
{{< glossary_tooltip text="PVCs" term_id="persistent-volume-claim" >}}.
### Scoring
- `SelectorSpreadPriority`: Spreads Pods across hosts, considering Pods that
belong to the same {{< glossary_tooltip text="Service" term_id="service" >}},
{{< glossary_tooltip term_id="statefulset" >}} or
{{< glossary_tooltip term_id="replica-set" >}}.
- `InterPodAffinityPriority`: Computes a sum by iterating through the elements
of weightedPodAffinityTerm and adding “weight” to the sum if the corresponding
PodAffinityTerm is satisfied for that node; the node(s) with the highest sum
are the most preferred.
- `LeastRequestedPriority`: Favors nodes with fewer requested resources. In other
words, the more Pods that are placed on a Node, and the more resources those
Pods use, the lower the ranking this policy will give.
- `MostRequestedPriority`: Favors nodes with most requested resources. This policy
will fit the scheduled Pods onto the smallest number of Nodes needed to run your
overall set of workloads.
- `RequestedToCapacityRatioPriority`: Creates a requestedToCapacity based ResourceAllocationPriority using default resource scoring function shape.
- `BalancedResourceAllocation`: Favors nodes with balanced resource usage.
- `NodePreferAvoidPodsPriority`: Prioritizes nodes according to the node annotation
`scheduler.alpha.kubernetes.io/preferAvoidPods`. You can use this to hint that
two different Pods shouldn't run on the same Node.
- `NodeAffinityPriority`: Prioritizes nodes according to node affinity scheduling
preferences indicated in PreferredDuringSchedulingIgnoredDuringExecution.
You can read more about this in [Assigning Pods to Nodes](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/).
- `TaintTolerationPriority`: Prepares the priority list for all the nodes, based on
the number of intolerable taints on the node. This policy adjusts a node's rank
taking that list into account.
- `ImageLocalityPriority`: Favors nodes that already have the
{{< glossary_tooltip text="container images" term_id="image" >}} for that
Pod cached locally.
- `ServiceSpreadingPriority`: For a given Service, this policy aims to make sure that
the Pods for the Service run on different nodes. It favours scheduling onto nodes
that don't have Pods for the service already assigned there. The overall outcome is
that the Service becomes more resilient to a single Node failure.
- `CalculateAntiAffinityPriorityMap`: This policy helps implement
[pod anti-affinity](/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity).
- `EqualPriorityMap`: Gives an equal weight of one to all nodes.
1. [Scheduling Policies](/docs/reference/scheduling/policies) allow you to
configure _Predicates_ for filtering and _Priorities_ for scoring.
1. [Scheduling Profiles](/docs/reference/scheduling/profiles) allow you to
configure Plugins that implement different scheduling stages, including:
`QueueSort`, `Filter`, `Score`, `Bind`, `Reserve`, `Permit`, and others. You
can also configure the kube-scheduler to run different profiles.
{{% /capture %}}
{{% capture whatsnext %}}

View File

@ -3,14 +3,14 @@ reviewers:
- ahg-g
title: Scheduling Framework
content_template: templates/concept
weight: 70
weight: 60
---
{{% capture overview %}}
{{< feature-state for_k8s_version="1.15" state="alpha" >}}
The scheduling framework is a new pluggable architecture for Kubernetes Scheduler
The scheduling framework is a pluggable architecture for Kubernetes Scheduler
that makes scheduler customizations easy. It adds a new set of "plugin" APIs to
the existing scheduler. Plugins are compiled into the scheduler. The APIs
allow most scheduling features to be implemented as plugins, while keeping the
@ -56,16 +56,16 @@ stateful tasks.
{{< figure src="/images/docs/scheduling-framework-extensions.png" title="scheduling framework extension points" >}}
### Queue sort
### QueueSort {#queue-sort}
These plugins are used to sort Pods in the scheduling queue. A queue sort plugin
essentially will provide a "less(Pod1, Pod2)" function. Only one queue sort
essentially provides a `Less(Pod1, Pod2)` function. Only one queue sort
plugin may be enabled at a time.
### Pre-filter
### PreFilter {#pre-filter}
These plugins are used to pre-process info about the Pod, or to check certain
conditions that the cluster or the Pod must meet. If a pre-filter plugin returns
conditions that the cluster or the Pod must meet. If a PreFilter plugin returns
an error, the scheduling cycle is aborted.
### Filter
@ -75,28 +75,25 @@ node, the scheduler will call filter plugins in their configured order. If any
filter plugin marks the node as infeasible, the remaining plugins will not be
called for that node. Nodes may be evaluated concurrently.
### Post-filter
### PreScore {#pre-score}
This is an informational extension point. Plugins will be called with a list of
nodes that passed the filtering phase. A plugin may use this data to update
internal state or to generate logs/metrics.
These plugins are used to perform "pre-scoring" work, which generates a sharable
state for Score plugins to use. If a PreScore plugin returns an error, the
scheduling cycle is aborted.
**Note:** Plugins wishing to perform "pre-scoring" work should use the
post-filter extension point.
### Scoring
### Score {#scoring}
These plugins are used to rank nodes that have passed the filtering phase. The
scheduler will call each scoring plugin for each node. There will be a well
defined range of integers representing the minimum and maximum scores. After the
[normalize scoring](#normalize-scoring) phase, the scheduler will combine node
[NormalizeScore](#normalize-scoring) phase, the scheduler will combine node
scores from all plugins according to the configured plugin weights.
### Normalize scoring
### NormalizeScore {#normalize-scoring}
These plugins are used to modify scores before the scheduler computes a final
ranking of Nodes. A plugin that registers for this extension point will be
called with the [scoring](#scoring) results from the same plugin. This is called
called with the [Score](#scoring) results from the same plugin. This is called
once per plugin per scheduling cycle.
For example, suppose a plugin `BlinkingLightScorer` ranks Nodes based on how
@ -104,7 +101,7 @@ many blinking lights they have.
```go
func ScoreNode(_ *v1.pod, n *v1.Node) (int, error) {
return getBlinkingLightCount(n)
return getBlinkingLightCount(n)
}
```
@ -114,21 +111,23 @@ extension point.
```go
func NormalizeScores(scores map[string]int) {
highest := 0
for _, score := range scores {
highest = max(highest, score)
}
for node, score := range scores {
scores[node] = score*NodeScoreMax/highest
}
highest := 0
for _, score := range scores {
highest = max(highest, score)
}
for node, score := range scores {
scores[node] = score*NodeScoreMax/highest
}
}
```
If any normalize-scoring plugin returns an error, the scheduling cycle is
If any NormalizeScore plugin returns an error, the scheduling cycle is
aborted.
**Note:** Plugins wishing to perform "pre-reserve" work should use the
normalize-scoring extension point.
{{< note >}}
Plugins wishing to perform "pre-reserve" work should use the
NormalizeScore extension point.
{{< /note >}}
### Reserve
@ -140,53 +139,53 @@ to prevent race conditions while the scheduler waits for the bind to succeed.
This is the last step in a scheduling cycle. Once a Pod is in the reserved
state, it will either trigger [Unreserve](#unreserve) plugins (on failure) or
[Post-bind](#post-bind) plugins (on success) at the end of the binding cycle.
*Note: This concept used to be referred to as "assume".*
[PostBind](#post-bind) plugins (on success) at the end of the binding cycle.
### Permit
These plugins are used to prevent or delay the binding of a Pod. A permit plugin
can do one of three things.
_Permit_ plugins are invoked at the end of the scheduling cycle for each Pod, to
prevent or delay the binding to the candidate node. A permit plugin can do one of
the three things:
1. **approve** \
Once all permit plugins approve a Pod, it is sent for binding.
Once all Permit plugins approve a Pod, it is sent for binding.
1. **deny** \
If any permit plugin denies a Pod, it is returned to the scheduling queue.
If any Permit plugin denies a Pod, it is returned to the scheduling queue.
This will trigger [Unreserve](#unreserve) plugins.
1. **wait** (with a timeout) \
If a permit plugin returns "wait", then the Pod is kept in the permit phase
until a [plugin approves it](#frameworkhandle). If a timeout occurs, **wait**
becomes **deny** and the Pod is returned to the scheduling queue, triggering
[Unreserve](#unreserve) plugins.
If a Permit plugin returns "wait", then the Pod is kept in an internal "waiting"
Pods list, and the binding cycle of this Pod starts but directly blocks until it
gets [approved](#frameworkhandle). If a timeout occurs, **wait** becomes **deny**
and the Pod is returned to the scheduling queue, triggering [Unreserve](#unreserve)
plugins.
**Approving a Pod binding**
{{< note >}}
While any plugin can access the list of "waiting" Pods and approve them
(see [`FrameworkHandle`](#frameworkhandle)), we expect only the permit
plugins to approve binding of reserved Pods that are in "waiting" state. Once a Pod
is approved, it is sent to the [PreBind](#pre-bind) phase.
{{< /note >}}
While any plugin can access the list of "waiting" Pods from the cache and
approve them (see [`FrameworkHandle`](#frameworkhandle)) we expect only the permit
plugins to approve binding of reserved Pods that are in "waiting" state. Once a
Pod is approved, it is sent to the pre-bind phase.
### Pre-bind
### PreBind {#pre-bind}
These plugins are used to perform any work required before a Pod is bound. For
example, a pre-bind plugin may provision a network volume and mount it on the
target node before allowing the Pod to run there.
If any pre-bind plugin returns an error, the Pod is [rejected](#unreserve) and
If any PreBind plugin returns an error, the Pod is [rejected](#unreserve) and
returned to the scheduling queue.
### Bind
These plugins are used to bind a Pod to a Node. Bind plugins will not be called
until all pre-bind plugins have completed. Each bind plugin is called in the
until all PreBind plugins have completed. Each bind plugin is called in the
configured order. A bind plugin may choose whether or not to handle the given
Pod. If a bind plugin chooses to handle a Pod, **the remaining bind plugins are
skipped**.
### Post-bind
### PostBind {#post-bind}
This is an informational extension point. Post-bind plugins are called after a
Pod is successfully bound. This is the end of a binding cycle, and can be used
@ -209,88 +208,35 @@ interfaces have the following form.
```go
type Plugin interface {
Name() string
Name() string
}
type QueueSortPlugin interface {
Plugin
Less(*v1.pod, *v1.pod) bool
Plugin
Less(*v1.pod, *v1.pod) bool
}
type PreFilterPlugin interface {
Plugin
PreFilter(PluginContext, *v1.pod) error
Plugin
PreFilter(context.Context, *framework.CycleState, *v1.pod) error
}
// ...
```
# Plugin Configuration
## Plugin configuration
Plugins can be enabled in the scheduler configuration. Also, default plugins can
be disabled in the configuration. In 1.15, there are no default plugins for the
scheduling framework.
You can enable or disable plugins in the scheduler configuration. If you are using
Kubernetes v1.18 or later, most scheduling
[plugins](/docs/reference/scheduling/profiles/#scheduling-plugins) are in use and
enabled by default.
The scheduler configuration can include configuration for plugins as well. Such
configurations are passed to the plugins at the time the scheduler initializes
them. The configuration is an arbitrary value. The receiving plugin should
decode and process the configuration.
In addition to default plugins, you can also implement your own scheduling
plugins and get them configured along with default plugins. You can visit
[scheduler-plugins](https://github.com/kubernetes-sigs/scheduler-plugins) for more details.
The following example shows a scheduler configuration that enables some
plugins at `reserve` and `preBind` extension points and disables a plugin. It
also provides a configuration to plugin `foo`.
```yaml
apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
...
plugins:
reserve:
enabled:
- name: foo
- name: bar
disabled:
- name: baz
preBind:
enabled:
- name: foo
disabled:
- name: baz
pluginConfig:
- name: foo
args: >
Arbitrary set of args to plugin foo
```
When an extension point is omitted from the configuration default plugins for
that extension points are used. When an extension point exists and `enabled` is
provided, the `enabled` plugins are called in addition to default plugins.
Default plugins are called first and then the additional enabled plugins are
called in the same order specified in the configuration. If a different order of
calling default plugins is desired, default plugins must be `disabled` and
`enabled` in the desired order.
Assuming there is a default plugin called `foo` at `reserve` and we are adding
plugin `bar` that we want to be invoked before `foo`, we should disable `foo`
and enable `bar` and `foo` in order. The following example shows the
configuration that achieves this:
```yaml
apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
...
plugins:
reserve:
enabled:
- name: bar
- name: foo
disabled:
- name: foo
```
If you are using Kubernetes v1.18 or later, you can configure a set of plugins as
a scheduler profile and then define multiple profiles to fit various kinds of workload.
Learn more at [multiple profiles](/docs/reference/scheduling/profiles/#multiple-profiles).
{{% /capture %}}

View File

@ -142,7 +142,7 @@ Area of Concern for Code | Recommendation |
--------------------------------------------- | ------------ |
Access over TLS only | If your code needs to communicate via TCP, ideally it would be performing a TLS handshake with the client ahead of time. With the exception of a few cases, the default behavior should be to encrypt everything in transit. Going one step further, even "behind the firewall" in our VPC's it's still a good idea to encrypt network traffic between services. This can be done through a process known as mutual or [mTLS](https://en.wikipedia.org/wiki/Mutual_authentication) which performs a two sided verification of communication between two certificate holding services. There are numerous tools that can be used to accomplish this in Kubernetes such as [Linkerd](https://linkerd.io/) and [Istio](https://istio.io/). |
Limiting port ranges of communication | This recommendation may be a bit self-explanatory, but wherever possible you should only expose the ports on your service that are absolutely essential for communication or metric gathering. |
3rd Party Dependency Security | Since our applications tend to have dependencies outside of our own codebases, it is a good practice to ensure that a regular scan of the code's dependencies are still secure with no CVE's currently filed against them. Each language has a tool for performing this check automatically. |
3rd Party Dependency Security | Since our applications tend to have dependencies outside of our own codebases, it is a good practice to regularly scan the code's dependencies to ensure that they are still secure with no vulnerabilities currently filed against them. Each language has a tool for performing this check automatically. |
Static Code Analysis | Most languages provide a way for a snippet of code to be analyzed for any potentially unsafe coding practices. Whenever possible you should perform checks using automated tooling that can scan codebases for common security errors. Some of the tools can be found here: https://www.owasp.org/index.php/Source_Code_Analysis_Tools |
Dynamic probing attacks | There are a few automated tools that are able to be run against your service to try some of the well known attacks that commonly befall services. These include SQL injection, CSRF, and XSS. One of the most popular dynamic analysis tools is the OWASP Zed Attack proxy https://www.owasp.org/index.php/OWASP_Zed_Attack_Proxy_Project |

View File

@ -422,10 +422,8 @@ LoadBalancer Ingress: a320587ffd19711e5a37606cf4a74574-1142138393.us-east-1.el
{{% capture whatsnext %}}
Kubernetes also supports Federated Services, which can span multiple
clusters and cloud providers, to provide increased availability,
better fault tolerance and greater scalability for your services. See
the [Federated Services User Guide](/docs/concepts/cluster-administration/federation-service-discovery/)
for further information.
* Learn more about [Using a Service to Access an Application in a Cluster](/docs/tasks/access-application-cluster/service-access-application-cluster/)
* Learn more about [Connecting a Front End to a Back End Using a Service](/docs/tasks/access-application-cluster/connecting-frontend-backend/)
* Learn more about [Creating an External Load Balancer](/docs/tasks/access-application-cluster/create-external-load-balancer/)
{{% /capture %}}

View File

@ -38,14 +38,16 @@ For more up-to-date specification, see
## Services
### A records
### A/AAAA records
"Normal" (not headless) Services are assigned a DNS A record for a name of the
form `my-svc.my-namespace.svc.cluster-domain.example`. This resolves to the cluster IP
"Normal" (not headless) Services are assigned a DNS A or AAAA record,
depending on the IP family of the service, for a name of the form
`my-svc.my-namespace.svc.cluster-domain.example`. This resolves to the cluster IP
of the Service.
"Headless" (without a cluster IP) Services are also assigned a DNS A record for
a name of the form `my-svc.my-namespace.svc.cluster-domain.example`. Unlike normal
"Headless" (without a cluster IP) Services are also assigned a DNS A or AAAA record,
depending on the IP family of the service, for a name of the form
`my-svc.my-namespace.svc.cluster-domain.example`. Unlike normal
Services, this resolves to the set of IPs of the pods selected by the Service.
Clients are expected to consume the set or else use standard round-robin
selection from the set.
@ -128,22 +130,22 @@ spec:
```
If there exists a headless service in the same namespace as the pod and with
the same name as the subdomain, the cluster's KubeDNS Server also returns an A
the same name as the subdomain, the cluster's DNS Server also returns an A or AAAA
record for the Pod's fully qualified hostname.
For example, given a Pod with the hostname set to "`busybox-1`" and the subdomain set to
"`default-subdomain`", and a headless Service named "`default-subdomain`" in
the same namespace, the pod will see its own FQDN as
"`busybox-1.default-subdomain.my-namespace.svc.cluster-domain.example`". DNS serves an
A record at that name, pointing to the Pod's IP. Both pods "`busybox1`" and
"`busybox2`" can have their distinct A records.
A or AAAA record at that name, pointing to the Pod's IP. Both pods "`busybox1`" and
"`busybox2`" can have their distinct A or AAAA records.
The Endpoints object can specify the `hostname` for any endpoint addresses,
along with its IP.
{{< note >}}
Because A records are not created for Pod names, `hostname` is required for the Pod's A
Because A or AAAA records are not created for Pod names, `hostname` is required for the Pod's A or AAAA
record to be created. A Pod with no `hostname` but with `subdomain` will only create the
A record for the headless service (`default-subdomain.my-namespace.svc.cluster-domain.example`),
A or AAAA record for the headless service (`default-subdomain.my-namespace.svc.cluster-domain.example`),
pointing to the Pod's IP address. Also, Pod needs to become ready in order to have a
record unless `publishNotReadyAddresses=True` is set on the Service.
{{< /note >}}

View File

@ -55,7 +55,7 @@ To enable IPv4/IPv6 dual-stack, enable the `IPv6DualStack` [feature gate](/docs/
* `--feature-gates="IPv6DualStack=true"`
* kube-proxy:
* `--proxy-mode=ipvs`
* `--cluster-cidrs=<IPv4 CIDR>,<IPv6 CIDR>`
* `--cluster-cidr=<IPv4 CIDR>,<IPv6 CIDR>`
* `--feature-gates="IPv6DualStack=true"`
{{< caution >}}

View File

@ -24,6 +24,21 @@ Endpoints.
{{% capture body %}}
## Motivation
The Endpoints API has provided a simple and straightforward way of
tracking network endpoints in Kubernetes. Unfortunately as Kubernetes clusters
and Services have gotten larger, limitations of that API became more visible.
Most notably, those included challenges with scaling to larger numbers of
network endpoints.
Since all network endpoints for a Service were stored in a single Endpoints
resource, those resources could get quite large. That affected the performance
of Kubernetes components (notably the master control plane) and resulted in
significant amounts of network traffic and processing when Endpoints changed.
EndpointSlices help you mitigate those issues as well as provide an extensible
platform for additional features such as topological routing.
## EndpointSlice resources {#endpointslice-resource}
In Kubernetes, an EndpointSlice contains references to a set of network
@ -32,6 +47,8 @@ for a Kubernetes Service when a {{< glossary_tooltip text="selector"
term_id="selector" >}} is specified. These EndpointSlices will include
references to any Pods that match the Service selector. EndpointSlices group
network endpoints together by unique Service and Port combinations.
The name of a EndpointSlice object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
As an example, here's a sample EndpointSlice resource for the `example`
Kubernetes Service.
@ -163,21 +180,6 @@ necessary soon anyway. Rolling updates of Deployments also provide a natural
repacking of EndpointSlices with all pods and their corresponding endpoints
getting replaced.
## Motivation
The Endpoints API has provided a simple and straightforward way of
tracking network endpoints in Kubernetes. Unfortunately as Kubernetes clusters
and Services have gotten larger, limitations of that API became more visible.
Most notably, those included challenges with scaling to larger numbers of
network endpoints.
Since all network endpoints for a Service were stored in a single Endpoints
resource, those resources could get quite large. That affected the performance
of Kubernetes components (notably the master control plane) and resulted in
significant amounts of network traffic and processing when Endpoints changed.
EndpointSlices help you mitigate those issues as well as provide an extensible
platform for additional features such as topological routing.
{{% /capture %}}
{{% capture whatsnext %}}

View File

@ -17,24 +17,15 @@ weight: 40
For clarity, this guide defines the following terms:
Node
: A worker machine in Kubernetes, part of a cluster.
Cluster
: A set of Nodes that run containerized applications managed by Kubernetes. For this example, and in most common Kubernetes deployments, nodes in the cluster are not part of the public internet.
Edge router
: A router that enforces the firewall policy for your cluster. This could be a gateway managed by a cloud provider or a physical piece of hardware.
Cluster network
: A set of links, logical or physical, that facilitate communication within a cluster according to the Kubernetes [networking model](/docs/concepts/cluster-administration/networking/).
Service
: A Kubernetes {{< glossary_tooltip term_id="service" >}} that identifies a set of Pods using {{< glossary_tooltip text="label" term_id="label" >}} selectors. Unless mentioned otherwise, Services are assumed to have virtual IPs only routable within the cluster network.
* Node: A worker machine in Kubernetes, part of a cluster.
* Cluster: A set of Nodes that run containerized applications managed by Kubernetes. For this example, and in most common Kubernetes deployments, nodes in the cluster are not part of the public internet.
* Edge router: A router that enforces the firewall policy for your cluster. This could be a gateway managed by a cloud provider or a physical piece of hardware.
* Cluster network: A set of links, logical or physical, that facilitate communication within a cluster according to the Kubernetes [networking model](/docs/concepts/cluster-administration/networking/).
* Service: A Kubernetes {{< glossary_tooltip term_id="service" >}} that identifies a set of Pods using {{< glossary_tooltip text="label" term_id="label" >}} selectors. Unless mentioned otherwise, Services are assumed to have virtual IPs only routable within the cluster network.
## What is Ingress?
Ingress exposes HTTP and HTTPS routes from outside the cluster to
[Ingress](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#ingress-v1beta1-networking-k8s-io) exposes HTTP and HTTPS routes from outside the cluster to
{{< link text="services" url="/docs/concepts/services-networking/service/" >}} within the cluster.
Traffic routing is controlled by rules defined on the Ingress resource.
@ -46,7 +37,7 @@ Traffic routing is controlled by rules defined on the Ingress resource.
[ Services ]
```
An Ingress can be configured to give Services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name based virtual hosting. An [Ingress controller](/docs/concepts/services-networking/ingress-controllers) is responsible for fulfilling the Ingress, usually with a load balancer, though it may also configure your edge router or additional frontends to help handle the traffic.
An Ingress may be configured to give Services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name based virtual hosting. An [Ingress controller](/docs/concepts/services-networking/ingress-controllers) is responsible for fulfilling the Ingress, usually with a load balancer, though it may also configure your edge router or additional frontends to help handle the traffic.
An Ingress does not expose arbitrary ports or protocols. Exposing services other than HTTP and HTTPS to the internet typically
uses a service of type [Service.Type=NodePort](/docs/concepts/services-networking/service/#nodeport) or
@ -82,16 +73,19 @@ spec:
- http:
paths:
- path: /testpath
pathType: Prefix
backend:
serviceName: test
servicePort: 80
```
As with all other Kubernetes resources, an Ingress needs `apiVersion`, `kind`, and `metadata` fields.
For general information about working with config files, see [deploying applications](/docs/tasks/run-application/run-stateless-application-deployment/), [configuring containers](/docs/tasks/configure-pod-container/configure-pod-configmap/), [managing resources](/docs/concepts/cluster-administration/manage-deployment/).
As with all other Kubernetes resources, an Ingress needs `apiVersion`, `kind`, and `metadata` fields.
The name of an Ingress object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
For general information about working with config files, see [deploying applications](/docs/tasks/run-application/run-stateless-application-deployment/), [configuring containers](/docs/tasks/configure-pod-container/configure-pod-configmap/), [managing resources](/docs/concepts/cluster-administration/manage-deployment/).
Ingress frequently uses annotations to configure some options depending on the Ingress controller, an example of which
is the [rewrite-target annotation](https://github.com/kubernetes/ingress-nginx/blob/master/docs/examples/rewrite/README.md).
Different [Ingress controller](/docs/concepts/services-networking/ingress-controllers) support different annotations. Review the documentation for
Different [Ingress controller](/docs/concepts/services-networking/ingress-controllers) support different annotations. Review the documentation for
your choice of Ingress controller to learn which annotations are supported.
The Ingress [spec](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status)
@ -124,6 +118,84 @@ backend is typically a configuration option of the [Ingress controller](/docs/co
If none of the hosts or paths match the HTTP request in the Ingress objects, the traffic is
routed to your default backend.
### Path Types
Each path in an Ingress has a corresponding path type. There are three supported
path types:
* _`ImplementationSpecific`_ (default): With this path type, matching is up to
the IngressClass. Implementations can treat this as a separate `pathType` or
treat it identically to `Prefix` or `Exact` path types.
* _`Exact`_: Matches the URL path exactly and with case sensitivity.
* _`Prefix`_: Matches based on a URL path prefix split by `/`. Matching is case
sensitive and done on a path element by element basis. A path element refers
to the list of labels in the path split by the `/` separator. A request is a
match for path _p_ if every _p_ is an element-wise prefix of _p_ of the
request path.
{{< note >}}
If the last element of the path is a substring of the
last element in request path, it is not a match (for example:
`/foo/bar` matches`/foo/bar/baz`, but does not match `/foo/barbaz`).
{{< /note >}}
#### Multiple Matches
In some cases, multiple paths within an Ingress will match a request. In those
cases precedence will be given first to the longest matching path. If two paths
are still equally matched, precedence will be given to paths with an exact path
type over prefix path type.
## Ingress Class
Ingresses can be implemented by different controllers, often with different
configuration. Each Ingress should specify a class, a reference to an
IngressClass resource that contains additional configuration including the name
of the controller that should implement the class.
```yaml
apiVersion: networking.k8s.io/v1beta1
kind: IngressClass
metadata:
name: external-lb
spec:
controller: example.com/ingress-controller
parameters:
apiGroup: k8s.example.com/v1alpha
kind: IngressParameters
name: external-lb
```
IngressClass resources contain an optional parameters field. This can be used to
reference additional configuration for this class.
### Deprecated Annotation
Before the IngressClass resource and `ingressClassName` field were added in
Kubernetes 1.18, Ingress classes were specified with a
`kubernetes.io/ingress.class` annotation on the Ingress. This annotation was
never formally defined, but was widely supported by Ingress controllers.
The newer `ingressClassName` field on Ingresses is a replacement for that
annotation, but is not a direct equivalent. While the annotation was generally
used to reference the name of the Ingress controller that should implement the
Ingress, the field is a reference to an IngressClass resource that contains
additional Ingress configuration, including the name of the Ingress controller.
### Default Ingress Class
You can mark a particular IngressClass as default for your cluster. Setting the
`ingressclass.kubernetes.io/is-default-class` annotation to `true` on an
IngressClass resource will ensure that new Ingresses without an
`ingressClassName` field specified will be assigned this default IngressClass.
{{< caution >}}
If you have more than one IngressClass marked as the default for your cluster,
the admission controller prevents creating new Ingress objects that don't have
an `ingressClassName` specified. You can resolve this by ensuring that at most 1
IngressClasess are marked as default in your cluster.
{{< /caution >}}
## Types of Ingress
### Single Service Ingress
@ -143,10 +215,10 @@ kubectl get ingress test-ingress
```
NAME HOSTS ADDRESS PORTS AGE
test-ingress * 107.178.254.228 80 59s
test-ingress * 203.0.113.123 80 59s
```
Where `107.178.254.228` is the IP allocated by the Ingress controller to satisfy
Where `203.0.113.123` is the IP allocated by the Ingress controller to satisfy
this Ingress.
{{< note >}}
@ -345,7 +417,7 @@ spec:
{{< note >}}
There is a gap between TLS features supported by various Ingress
controllers. Please refer to documentation on
[nginx](https://git.k8s.io/ingress-nginx/README.md#https),
[nginx](https://kubernetes.github.io/ingress-nginx/user-guide/tls/),
[GCE](https://git.k8s.io/ingress-gce/README.md#frontend-https), or any other
platform specific Ingress controller to understand how TLS works in your environment.
{{< /note >}}
@ -474,6 +546,7 @@ You can expose a Service in multiple ways that don't directly involve the Ingres
{{% /capture %}}
{{% capture whatsnext %}}
* Learn about [ingress controllers](/docs/concepts/services-networking/ingress-controllers/)
* Learn about the [Ingress API](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#ingress-v1beta1-networking-k8s-io)
* Learn about [Ingress Controllers](/docs/concepts/services-networking/ingress-controllers/)
* [Set up Ingress on Minikube with the NGINX Controller](/docs/tasks/access-application-cluster/ingress-minikube)
{{% /capture %}}

View File

@ -11,16 +11,16 @@ weight: 50
{{< toc >}}
{{% capture overview %}}
A network policy is a specification of how groups of pods are allowed to communicate with each other and other network endpoints.
A network policy is a specification of how groups of {{< glossary_tooltip text="pods" term_id="pod">}} are allowed to communicate with each other and other network endpoints.
`NetworkPolicy` resources use labels to select pods and define rules which specify what traffic is allowed to the selected pods.
NetworkPolicy resources use {{< glossary_tooltip text="labels" term_id="label">}} to select pods and define rules which specify what traffic is allowed to the selected pods.
{{% /capture %}}
{{% capture body %}}
## Prerequisites
Network policies are implemented by the network plugin, so you must be using a networking solution which supports `NetworkPolicy` - simply creating the resource without a controller to implement it will have no effect.
Network policies are implemented by the [network plugin](/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/). To use network policies, you must be using a networking solution which supports NetworkPolicy. Creating a NetworkPolicy resource without a controller that implements it will have no effect.
## Isolated and Non-isolated Pods
@ -30,11 +30,11 @@ Pods become isolated by having a NetworkPolicy that selects them. Once there is
Network policies do not conflict, they are additive. If any policy or policies select a pod, the pod is restricted to what is allowed by the union of those policies' ingress/egress rules. Thus, order of evaluation does not affect the policy result.
## The `NetworkPolicy` Resource
## The NetworkPolicy resource {#networkpolicy-resource}
See the [NetworkPolicy](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#networkpolicy-v1-networking-k8s-io) for a full definition of the resource.
See the [NetworkPolicy](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#networkpolicy-v1-networking-k8s-io) reference for a full definition of the resource.
An example `NetworkPolicy` might look like this:
An example NetworkPolicy might look like this:
```yaml
apiVersion: networking.k8s.io/v1
@ -73,23 +73,25 @@ spec:
port: 5978
```
*POSTing this to the API server will have no effect unless your chosen networking solution supports network policy.*
{{< note >}}
POSTing this to the API server for your cluster will have no effect unless your chosen networking solution supports network policy.
{{< /note >}}
__Mandatory Fields__: As with all other Kubernetes config, a `NetworkPolicy`
__Mandatory Fields__: As with all other Kubernetes config, a NetworkPolicy
needs `apiVersion`, `kind`, and `metadata` fields. For general information
about working with config files, see
[Configure Containers Using a ConfigMap](/docs/tasks/configure-pod-container/configure-pod-configmap/),
and [Object Management](/docs/concepts/overview/working-with-objects/object-management).
__spec__: `NetworkPolicy` [spec](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status) has all the information needed to define a particular network policy in the given namespace.
__spec__: NetworkPolicy [spec](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status) has all the information needed to define a particular network policy in the given namespace.
__podSelector__: Each `NetworkPolicy` includes a `podSelector` which selects the grouping of pods to which the policy applies. The example policy selects pods with the label "role=db". An empty `podSelector` selects all pods in the namespace.
__podSelector__: Each NetworkPolicy includes a `podSelector` which selects the grouping of pods to which the policy applies. The example policy selects pods with the label "role=db". An empty `podSelector` selects all pods in the namespace.
__policyTypes__: Each `NetworkPolicy` includes a `policyTypes` list which may include either `Ingress`, `Egress`, or both. The `policyTypes` field indicates whether or not the given policy applies to ingress traffic to selected pod, egress traffic from selected pods, or both. If no `policyTypes` are specified on a NetworkPolicy then by default `Ingress` will always be set and `Egress` will be set if the NetworkPolicy has any egress rules.
__policyTypes__: Each NetworkPolicy includes a `policyTypes` list which may include either `Ingress`, `Egress`, or both. The `policyTypes` field indicates whether or not the given policy applies to ingress traffic to selected pod, egress traffic from selected pods, or both. If no `policyTypes` are specified on a NetworkPolicy then by default `Ingress` will always be set and `Egress` will be set if the NetworkPolicy has any egress rules.
__ingress__: Each `NetworkPolicy` may include a list of whitelist `ingress` rules. Each rule allows traffic which matches both the `from` and `ports` sections. The example policy contains a single rule, which matches traffic on a single port, from one of three sources, the first specified via an `ipBlock`, the second via a `namespaceSelector` and the third via a `podSelector`.
__ingress__: Each NetworkPolicy may include a list of whitelist `ingress` rules. Each rule allows traffic which matches both the `from` and `ports` sections. The example policy contains a single rule, which matches traffic on a single port, from one of three sources, the first specified via an `ipBlock`, the second via a `namespaceSelector` and the third via a `podSelector`.
__egress__: Each `NetworkPolicy` may include a list of whitelist `egress` rules. Each rule allows traffic which matches both the `to` and `ports` sections. The example policy contains a single rule, which matches traffic on a single port to any destination in `10.0.0.0/24`.
__egress__: Each NetworkPolicy may include a list of whitelist `egress` rules. Each rule allows traffic which matches both the `to` and `ports` sections. The example policy contains a single rule, which matches traffic on a single port to any destination in `10.0.0.0/24`.
So, the example NetworkPolicy:
@ -107,7 +109,7 @@ See the [Declare Network Policy](/docs/tasks/administer-cluster/declare-network-
There are four kinds of selectors that can be specified in an `ingress` `from` section or `egress` `to` section:
__podSelector__: This selects particular Pods in the same namespace as the `NetworkPolicy` which should be allowed as ingress sources or egress destinations.
__podSelector__: This selects particular Pods in the same namespace as the NetworkPolicy which should be allowed as ingress sources or egress destinations.
__namespaceSelector__: This selects particular namespaces for which all Pods should be allowed as ingress sources or egress destinations.
@ -168,16 +170,7 @@ in that namespace.
You can create a "default" isolation policy for a namespace by creating a NetworkPolicy that selects all pods but does not allow any ingress traffic to those pods.
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
spec:
podSelector: {}
policyTypes:
- Ingress
```
{{< codenew file="service/networking/network-policy-default-deny-ingress.yaml" >}}
This ensures that even pods that aren't selected by any other NetworkPolicy will still be isolated. This policy does not change the default egress isolation behavior.
@ -185,33 +178,13 @@ This ensures that even pods that aren't selected by any other NetworkPolicy will
If you want to allow all traffic to all pods in a namespace (even if policies are added that cause some pods to be treated as "isolated"), you can create a policy that explicitly allows all traffic in that namespace.
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all
spec:
podSelector: {}
ingress:
- {}
policyTypes:
- Ingress
```
{{< codenew file="service/networking/network-policy-allow-all-ingress.yaml" >}}
### Default deny all egress traffic
You can create a "default" egress isolation policy for a namespace by creating a NetworkPolicy that selects all pods but does not allow any egress traffic from those pods.
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
spec:
podSelector: {}
policyTypes:
- Egress
```
{{< codenew file="service/networking/network-policy-default-deny-egress.yaml" >}}
This ensures that even pods that aren't selected by any other NetworkPolicy will not be allowed egress traffic. This policy does not
change the default ingress isolation behavior.
@ -220,34 +193,13 @@ change the default ingress isolation behavior.
If you want to allow all traffic from all pods in a namespace (even if policies are added that cause some pods to be treated as "isolated"), you can create a policy that explicitly allows all egress traffic in that namespace.
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all
spec:
podSelector: {}
egress:
- {}
policyTypes:
- Egress
```
{{< codenew file="service/networking/network-policy-allow-all-egress.yaml" >}}
### Default deny all ingress and all egress traffic
You can create a "default" policy for a namespace which prevents all ingress AND egress traffic by creating the following NetworkPolicy in that namespace.
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
```
{{< codenew file="service/networking/network-policy-default-deny-all.yaml" >}}
This ensures that even pods that aren't selected by any other NetworkPolicy will not be allowed ingress or egress traffic.
@ -255,9 +207,12 @@ This ensures that even pods that aren't selected by any other NetworkPolicy will
{{< feature-state for_k8s_version="v1.12" state="alpha" >}}
Kubernetes supports SCTP as a `protocol` value in `NetworkPolicy` definitions as an alpha feature. To enable this feature, the cluster administrator needs to enable the `SCTPSupport` feature gate on the apiserver, for example, `“--feature-gates=SCTPSupport=true,...”`. When the feature gate is enabled, users can set the `protocol` field of a `NetworkPolicy` to `SCTP`. Kubernetes sets up the network accordingly for the SCTP associations, just like it does for TCP connections.
To use this feature, you (or your cluster administrator) will need to enable the `SCTPSupport` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) for the API server with `--feature-gates=SCTPSupport=true,…`.
When the feature gate is enabled, you can set the `protocol` field of a NetworkPolicy to `SCTP`.
The CNI plugin has to support SCTP as `protocol` value in `NetworkPolicy`.
{{< note >}}
You must be using a {{< glossary_tooltip text="CNI" term_id="cni" >}} plugin that supports SCTP protocol NetworkPolicies.
{{< /note >}}
{{% /capture %}}
@ -266,6 +221,6 @@ The CNI plugin has to support SCTP as `protocol` value in `NetworkPolicy`.
- See the [Declare Network Policy](/docs/tasks/administer-cluster/declare-network-policy/)
walkthrough for further examples.
- See more [Recipes](https://github.com/ahmetb/kubernetes-network-policy-recipes) for common scenarios enabled by the NetworkPolicy resource.
- See more [recipes](https://github.com/ahmetb/kubernetes-network-policy-recipes) for common scenarios enabled by the NetworkPolicy resource.
{{% /capture %}}

View File

@ -46,23 +46,6 @@ with it, while intrazonal traffic does not. Other common needs include being abl
to route traffic to a local Pod managed by a DaemonSet, or keeping traffic to
Nodes connected to the same top-of-rack switch for the lowest latency.
## Prerequisites
The following prerequisites are needed in order to enable topology aware service
routing:
* Kubernetes 1.17 or later
* Kube-proxy running in iptables mode or IPVS mode
* Enable [Endpoint Slices](/docs/concepts/services-networking/endpoint-slices/)
## Enable Service Topology
To enable service topology, enable the `ServiceTopology` feature gate for
kube-apiserver and kube-proxy:
```
--feature-gates="ServiceTopology=true"
```
## Using Service Topology
@ -117,6 +100,98 @@ traffic as follows.
it is used.
## Examples
The following are common examples of using the Service Topology feature.
### Only Node Local Endpoints
A Service that only routes to node local endpoints. If no endpoints exist on the node, traffic is dropped:
```yaml
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- protocol: TCP
port: 80
targetPort: 9376
topologyKeys:
- "kubernetes.io/hostname"
```
### Prefer Node Local Endpoints
A Service that prefers node local Endpoints but falls back to cluster wide endpoints if node local endpoints do not exist:
```yaml
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- protocol: TCP
port: 80
targetPort: 9376
topologyKeys:
- "kubernetes.io/hostname"
- "*"
```
### Only Zonal or Regional Endpoints
A Service that prefers zonal then regional endpoints. If no endpoints exist in either, traffic is dropped.
```yaml
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- protocol: TCP
port: 80
targetPort: 9376
topologyKeys:
- "topology.kubernetes.io/zone"
- "topology.kubernetes.io/region"
```
### Prefer Node Local, Zonal, then Regional Endpoints
A Service that prefers node local, zonal, then regional endpoints but falls back to cluster wide endpoints.
```yaml
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- protocol: TCP
port: 80
targetPort: 9376
topologyKeys:
- "kubernetes.io/hostname"
- "topology.kubernetes.io/zone"
- "topology.kubernetes.io/region"
- "*"
```
{{% /capture %}}
{{% capture whatsnext %}}

View File

@ -73,6 +73,8 @@ balancer in between your application and the backend Pods.
A Service in Kubernetes is a REST object, similar to a Pod. Like all of the
REST objects, you can `POST` a Service definition to the API server to create
a new instance.
The name of a Service object must be a valid
[DNS label name](/docs/concepts/overview/working-with-objects/names#dns-label-names).
For example, suppose you have a set of Pods that each listen on TCP port 9376
and carry a label `app=MyApp`:
@ -167,6 +169,9 @@ subsets:
- port: 9376
```
The name of the Endpoints object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
{{< note >}}
The endpoint IPs _must not_ be: loopback (127.0.0.0/8 for IPv4, ::1/128 for IPv6), or
link-local (169.254.0.0/16 and 224.0.0.0/24 for IPv4, fe80::/64 for IPv6).
@ -197,6 +202,17 @@ endpoints.
EndpointSlices provide additional attributes and functionality which is
described in detail in [EndpointSlices](/docs/concepts/services-networking/endpoint-slices/).
### Application protocol
{{< feature-state for_k8s_version="v1.18" state="alpha" >}}
The AppProtocol field provides a way to specify an application protocol to be
used for each Service port.
As an alpha feature, this field is not enabled by default. To use this field,
enable the `ServiceAppProtocol` [feature
gate](/docs/reference/command-line-tools-reference/feature-gates/).
## Virtual IPs and service proxies
Every node in a Kubernetes cluster runs a `kube-proxy`. `kube-proxy` is
@ -1173,19 +1189,6 @@ SCTP is not supported on Windows based nodes.
The kube-proxy does not support the management of SCTP associations when it is in userspace mode.
{{< /warning >}}
## Future work
In the future, the proxy policy for Services can become more nuanced than
simple round-robin balancing, for example master-elected or sharded. We also
envision that some Services will have "real" load balancers, in which case the
virtual IP address will simply transport the packets there.
The Kubernetes project intends to improve support for L7 (HTTP) Services.
The Kubernetes project intends to have more flexible ingress modes for Services
that encompass the current ClusterIP, NodePort, and LoadBalancer modes and more.
{{% /capture %}}
{{% capture whatsnext %}}

View File

@ -46,6 +46,9 @@ To enable dynamic provisioning, a cluster administrator needs to pre-create
one or more StorageClass objects for users.
StorageClass objects define which provisioner should be used and what parameters
should be passed to that provisioner when dynamic provisioning is invoked.
The name of a StorageClass object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
The following manifest creates a storage class "slow" which provisions standard
disk-like persistent disks.

View File

@ -4,6 +4,7 @@ reviewers:
- saad-ali
- thockin
- msau42
- xing-yang
title: Persistent Volumes
feature:
title: Storage orchestration
@ -16,7 +17,7 @@ weight: 20
{{% capture overview %}}
This document describes the current state of `PersistentVolumes` in Kubernetes. Familiarity with [volumes](/docs/concepts/storage/volumes/) is suggested.
This document describes the current state of _persistent volumes_ in Kubernetes. Familiarity with [volumes](/docs/concepts/storage/volumes/) is suggested.
{{% /capture %}}
@ -25,23 +26,16 @@ This document describes the current state of `PersistentVolumes` in Kubernetes.
## Introduction
Managing storage is a distinct problem from managing compute instances. The `PersistentVolume` subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed. To do this, we introduce two new API resources: `PersistentVolume` and `PersistentVolumeClaim`.
Managing storage is a distinct problem from managing compute instances. The PersistentVolume subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed. To do this, we introduce two new API resources: PersistentVolume and PersistentVolumeClaim.
A `PersistentVolume` (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using [Storage Classes](/docs/concepts/storage/storage-classes/). It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using [Storage Classes](/docs/concepts/storage/storage-classes/). It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A `PersistentVolumeClaim` (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted once read/write or many times read-only).
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted once read/write or many times read-only).
While `PersistentVolumeClaims` allow a user to consume abstract storage
resources, it is common that users need `PersistentVolumes` with varying
properties, such as performance, for different problems. Cluster administrators
need to be able to offer a variety of `PersistentVolumes` that differ in more
ways than just size and access modes, without exposing users to the details of
how those volumes are implemented. For these needs, there is the `StorageClass`
resource.
While PersistentVolumeClaims allow a user to consume abstract storage resources, it is common that users need PersistentVolumes with varying properties, such as performance, for different problems. Cluster administrators need to be able to offer a variety of PersistentVolumes that differ in more ways than just size and access modes, without exposing users to the details of how those volumes are implemented. For these needs, there is the _StorageClass_ resource.
See the [detailed walkthrough with working examples](/docs/tasks/configure-pod-container/configure-persistent-volume-storage/).
## Lifecycle of a volume and claim
PVs are resources in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle:
@ -51,12 +45,14 @@ PVs are resources in the cluster. PVCs are requests for those resources and also
There are two ways PVs may be provisioned: statically or dynamically.
#### Static
A cluster administrator creates a number of PVs. They carry the details of the real storage, which is available for use by cluster users. They exist in the Kubernetes API and are available for consumption.
#### Dynamic
When none of the static PVs the administrator created match a user's `PersistentVolumeClaim`,
When none of the static PVs the administrator created match a user's PersistentVolumeClaim,
the cluster may try to dynamically provision a volume specially for the PVC.
This provisioning is based on `StorageClasses`: the PVC must request a
This provisioning is based on StorageClasses: the PVC must request a
[storage class](/docs/concepts/storage/storage-classes/) and
the administrator must have created and configured that class for dynamic
provisioning to occur. Claims that request the class `""` effectively disable
@ -71,7 +67,7 @@ check [kube-apiserver](/docs/admin/kube-apiserver/) documentation.
### Binding
A user creates, or in the case of dynamic provisioning, has already created, a `PersistentVolumeClaim` with a specific amount of storage requested and with certain access modes. A control loop in the master watches for new PVCs, finds a matching PV (if possible), and binds them together. If a PV was dynamically provisioned for a new PVC, the loop will always bind that PV to the PVC. Otherwise, the user will always get at least what they asked for, but the volume may be in excess of what was requested. Once bound, `PersistentVolumeClaim` binds are exclusive, regardless of how they were bound. A PVC to PV binding is a one-to-one mapping.
A user creates, or in the case of dynamic provisioning, has already created, a PersistentVolumeClaim with a specific amount of storage requested and with certain access modes. A control loop in the master watches for new PVCs, finds a matching PV (if possible), and binds them together. If a PV was dynamically provisioned for a new PVC, the loop will always bind that PV to the PVC. Otherwise, the user will always get at least what they asked for, but the volume may be in excess of what was requested. Once bound, PersistentVolumeClaim binds are exclusive, regardless of how they were bound. A PVC to PV binding is a one-to-one mapping, using a ClaimRef which is a bi-directional binding between the PersistentVolume and the PersistentVolumeClaim.
Claims will remain unbound indefinitely if a matching volume does not exist. Claims will be bound as matching volumes become available. For example, a cluster provisioned with many 50Gi PVs would not match a PVC requesting 100Gi. The PVC can be bound when a 100Gi PV is added to the cluster.
@ -79,10 +75,10 @@ Claims will remain unbound indefinitely if a matching volume does not exist. Cla
Pods use claims as volumes. The cluster inspects the claim to find the bound volume and mounts that volume for a Pod. For volumes that support multiple access modes, the user specifies which mode is desired when using their claim as a volume in a Pod.
Once a user has a claim and that claim is bound, the bound PV belongs to the user for as long as they need it. Users schedule Pods and access their claimed PVs by including a `persistentVolumeClaim` in their Pod's volumes block. [See below for syntax details](#claims-as-volumes).
Once a user has a claim and that claim is bound, the bound PV belongs to the user for as long as they need it. Users schedule Pods and access their claimed PVs by including a `persistentVolumeClaim` section in a Pod's `volumes` block. See [Claims As Volumes](#claims-as-volumes) for more details on this.
### Storage Object in Use Protection
The purpose of the Storage Object in Use Protection feature is to ensure that Persistent Volume Claims (PVCs) in active use by a Pod and Persistent Volume (PVs) that are bound to PVCs are not removed from the system, as this may result in data loss.
The purpose of the Storage Object in Use Protection feature is to ensure that PersistentVolumeClaims (PVCs) in active use by a Pod and PersistentVolume (PVs) that are bound to PVCs are not removed from the system, as this may result in data loss.
{{< note >}}
PVC is in active use by a Pod when a Pod object exists that is using the PVC.
@ -130,19 +126,19 @@ Events: <none>
### Reclaiming
When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a `PersistentVolume` tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled, or Deleted.
When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled, or Deleted.
#### Retain
The `Retain` reclaim policy allows for manual reclamation of the resource. When the `PersistentVolumeClaim` is deleted, the `PersistentVolume` still exists and the volume is considered "released". But it is not yet available for another claim because the previous claimant's data remains on the volume. An administrator can manually reclaim the volume with the following steps.
The `Retain` reclaim policy allows for manual reclamation of the resource. When the PersistentVolumeClaim is deleted, the PersistentVolume still exists and the volume is considered "released". But it is not yet available for another claim because the previous claimant's data remains on the volume. An administrator can manually reclaim the volume with the following steps.
1. Delete the `PersistentVolume`. The associated storage asset in external infrastructure (such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume) still exists after the PV is deleted.
1. Delete the PersistentVolume. The associated storage asset in external infrastructure (such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume) still exists after the PV is deleted.
1. Manually clean up the data on the associated storage asset accordingly.
1. Manually delete the associated storage asset, or if you want to reuse the same storage asset, create a new `PersistentVolume` with the storage asset definition.
1. Manually delete the associated storage asset, or if you want to reuse the same storage asset, create a new PersistentVolume with the storage asset definition.
#### Delete
For volume plugins that support the `Delete` reclaim policy, deletion removes both the `PersistentVolume` object from Kubernetes, as well as the associated storage asset in the external infrastructure, such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume. Volumes that were dynamically provisioned inherit the [reclaim policy of their `StorageClass`](#reclaim-policy), which defaults to `Delete`. The administrator should configure the `StorageClass` according to users' expectations; otherwise, the PV must be edited or patched after it is created. See [Change the Reclaim Policy of a PersistentVolume](/docs/tasks/administer-cluster/change-pv-reclaim-policy/).
For volume plugins that support the `Delete` reclaim policy, deletion removes both the PersistentVolume object from Kubernetes, as well as the associated storage asset in the external infrastructure, such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume. Volumes that were dynamically provisioned inherit the [reclaim policy of their StorageClass](#reclaim-policy), which defaults to `Delete`. The administrator should configure the StorageClass according to users' expectations; otherwise, the PV must be edited or patched after it is created. See [Change the Reclaim Policy of a PersistentVolume](/docs/tasks/administer-cluster/change-pv-reclaim-policy/).
#### Recycle
@ -212,8 +208,8 @@ allowVolumeExpansion: true
```
To request a larger volume for a PVC, edit the PVC object and specify a larger
size. This triggers expansion of the volume that backs the underlying `PersistentVolume`. A
new `PersistentVolume` is never created to satisfy the claim. Instead, an existing volume is resized.
size. This triggers expansion of the volume that backs the underlying PersistentVolume. A
new PersistentVolume is never created to satisfy the claim. Instead, an existing volume is resized.
#### CSI Volume expansion
@ -227,7 +223,7 @@ Support for expanding CSI volumes is enabled by default but it also requires a s
You can only resize volumes containing a file system if the file system is XFS, Ext3, or Ext4.
When a volume contains a file system, the file system is only resized when a new Pod is using
the `PersistentVolumeClaim` in ReadWrite mode. File system expansion is either done when a Pod is starting up
the PersistentVolumeClaim in `ReadWrite` mode. File system expansion is either done when a Pod is starting up
or when a Pod is running and the underlying file system supports online expansion.
FlexVolumes allow resize if the driver is set with the `RequiresFSResize` capability to `true`.
@ -260,7 +256,7 @@ Expanding EBS volumes is a time-consuming operation. Also, there is a per-volume
## Types of Persistent Volumes
`PersistentVolume` types are implemented as plugins. Kubernetes currently supports the following plugins:
PersistentVolume types are implemented as plugins. Kubernetes currently supports the following plugins:
* GCEPersistentDisk
* AWSElasticBlockStore
@ -286,6 +282,8 @@ Expanding EBS volumes is a time-consuming operation. Also, there is a per-volume
## Persistent Volumes
Each PV contains a spec and status, which is the specification and status of the volume.
The name of a PersistentVolume object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
```yaml
apiVersion: v1
@ -308,6 +306,10 @@ spec:
server: 172.17.0.2
```
{{< note >}}
Helper programs relating to the volume type may be required for consumption of a PersistentVolume within a cluster. In this example, the PersistentVolume is of type NFS and the helper program /sbin/mount.nfs is required to support the mounting of NFS filesystems.
{{< /note >}}
### Capacity
Generally, a PV will have a specific storage capacity. This is set using the PV's `capacity` attribute. See the Kubernetes [Resource Model](https://git.k8s.io/community/contributors/design-proposals/scheduling/resources.md) to understand the units expected by `capacity`.
@ -316,16 +318,28 @@ Currently, storage size is the only resource that can be set or requested. Futu
### Volume Mode
{{< feature-state for_k8s_version="v1.13" state="beta" >}}
{{< feature-state for_k8s_version="v1.18" state="stable" >}}
Prior to Kubernetes 1.9, all volume plugins created a filesystem on the persistent volume.
Now, you can set the value of `volumeMode` to `block` to use a raw block device, or `filesystem`
to use a filesystem. `filesystem` is the default if the value is omitted. This is an optional API
parameter.
Kubernetes supports two `volumeModes` of PersistentVolumes: `Filesystem` and `Block`.
`volumeMode` is an optional API parameter.
`Filesystem` is the default mode used when `volumeMode` parameter is omitted.
A volume with `volumeMode: Filesystem` is *mounted* into Pods into a directory. If the volume
is backed by a block device and the device is empty, Kuberneretes creates a filesystem
on the device before mounting it for the first time.
You can set the value of `volumeMode` to `Block` to use a volume as a raw block device.
Such volume is presented into a Pod as a block device, without any filesystem on it.
This mode is useful to provide a Pod the fastest possible way to access a volume, without
any filesystem layer between the Pod and the volume. On the other hand, the application
running in the Pod must know how to handle a raw block device.
See [Raw Block Volume Support](docs/concepts/storage/persistent-volumes/#raw-block-volume-support)
for an example on how to use a volume with `volumeMode: Block` in a Pod.
### Access Modes
A `PersistentVolume` can be mounted on a host in any way supported by the resource provider. As shown in the table below, providers will have different capabilities and each PV's access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV's capabilities.
A PersistentVolume can be mounted on a host in any way supported by the resource provider. As shown in the table below, providers will have different capabilities and each PV's access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV's capabilities.
The access modes are:
@ -440,6 +454,8 @@ The CLI will show the name of the PVC bound to the PV.
## PersistentVolumeClaims
Each PVC contains a spec and status, which is the specification and status of the claim.
The name of a PersistentVolumeClaim object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
```yaml
apiVersion: v1
@ -499,22 +515,22 @@ by the cluster, depending on whether the
is turned on.
* If the admission plugin is turned on, the administrator may specify a
default `StorageClass`. All PVCs that have no `storageClassName` can be bound only to
PVs of that default. Specifying a default `StorageClass` is done by setting the
default StorageClass. All PVCs that have no `storageClassName` can be bound only to
PVs of that default. Specifying a default StorageClass is done by setting the
annotation `storageclass.kubernetes.io/is-default-class` equal to `true` in
a `StorageClass` object. If the administrator does not specify a default, the
a StorageClass object. If the administrator does not specify a default, the
cluster responds to PVC creation as if the admission plugin were turned off. If
more than one default is specified, the admission plugin forbids the creation of
all PVCs.
* If the admission plugin is turned off, there is no notion of a default
`StorageClass`. All PVCs that have no `storageClassName` can be bound only to PVs that
StorageClass. All PVCs that have no `storageClassName` can be bound only to PVs that
have no class. In this case, the PVCs that have no `storageClassName` are treated the
same way as PVCs that have their `storageClassName` set to `""`.
Depending on installation method, a default StorageClass may be deployed
to a Kubernetes cluster by addon manager during installation.
When a PVC specifies a `selector` in addition to requesting a `StorageClass`,
When a PVC specifies a `selector` in addition to requesting a StorageClass,
the requirements are ANDed together: only a PV of the requested class and with
the requested labels may be bound to the PVC.
@ -528,7 +544,7 @@ it won't be supported in a future Kubernetes release.
## Claims As Volumes
Pods access storage by using the claim as a volume. Claims must exist in the same namespace as the Pod using the claim. The cluster finds the claim in the Pod's namespace and uses it to get the `PersistentVolume` backing the claim. The volume is then mounted to the host and into the Pod.
Pods access storage by using the claim as a volume. Claims must exist in the same namespace as the Pod using the claim. The cluster finds the claim in the Pod's namespace and uses it to get the PersistentVolume backing the claim. The volume is then mounted to the host and into the Pod.
```yaml
apiVersion: v1
@ -550,30 +566,28 @@ spec:
### A Note on Namespaces
`PersistentVolumes` binds are exclusive, and since `PersistentVolumeClaims` are namespaced objects, mounting claims with "Many" modes (`ROX`, `RWX`) is only possible within one namespace.
PersistentVolumes binds are exclusive, and since PersistentVolumeClaims are namespaced objects, mounting claims with "Many" modes (`ROX`, `RWX`) is only possible within one namespace.
## Raw Block Volume Support
{{< feature-state for_k8s_version="v1.13" state="beta" >}}
{{< feature-state for_k8s_version="v1.18" state="stable" >}}
The following volume plugins support raw block volumes, including dynamic provisioning where
applicable:
* AWSElasticBlockStore
* AzureDisk
* CSI
* FC (Fibre Channel)
* GCEPersistentDisk
* iSCSI
* Local volume
* OpenStack Cinder
* RBD (Ceph Block Device)
* VsphereVolume (alpha)
* VsphereVolume
{{< note >}}
Only FC and iSCSI volumes supported raw block volumes in Kubernetes 1.9.
Support for the additional plugins was added in 1.10.
{{< /note >}}
### PersistentVolume using a Raw Block Volume {#persistent-volume-using-a-raw-block-volume}
### Persistent Volumes using a Raw Block Volume
```yaml
apiVersion: v1
kind: PersistentVolume
@ -591,7 +605,8 @@ spec:
lun: 0
readOnly: false
```
### Persistent Volume Claim requesting a Raw Block Volume
### PersistentVolumeClaim requesting a Raw Block Volume {#persistent-volume-claim-requesting-a-raw-block-volume}
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
@ -605,7 +620,9 @@ spec:
requests:
storage: 10Gi
```
### Pod specification adding Raw Block Device path in container
```yaml
apiVersion: v1
kind: Pod
@ -632,7 +649,7 @@ When adding a raw block device for a Pod, you specify the device path in the con
### Binding Block Volumes
If a user requests a raw block volume by indicating this using the `volumeMode` field in the `PersistentVolumeClaim` spec, the binding rules differ slightly from previous releases that didn't consider this mode as part of the spec.
If a user requests a raw block volume by indicating this using the `volumeMode` field in the PersistentVolumeClaim spec, the binding rules differ slightly from previous releases that didn't consider this mode as part of the spec.
Listed is a table of possible combinations the user and admin might specify for requesting a raw block device. The table indicates if the volume will be bound or not given the combinations:
Volume binding matrix for statically provisioned volumes:
@ -654,14 +671,15 @@ Only statically provisioned volumes are supported for alpha release. Administrat
## Volume Snapshot and Restore Volume from Snapshot Support
{{< feature-state for_k8s_version="v1.12" state="alpha" >}}
{{< feature-state for_k8s_version="v1.17" state="beta" >}}
Volume snapshot feature was added to support CSI Volume Plugins only. For details, see [volume snapshots](/docs/concepts/storage/volume-snapshots/).
To enable support for restoring a volume from a volume snapshot data source, enable the
`VolumeSnapshotDataSource` feature gate on the apiserver and controller-manager.
### Create Persistent Volume Claim from Volume Snapshot
### Create a PersistentVolumeClaim from a Volume Snapshot {#create-persistent-volume-claim-from-volume-snapshot}
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
@ -682,14 +700,10 @@ spec:
## Volume Cloning
{{< feature-state for_k8s_version="v1.16" state="beta" >}}
[Volume Cloning](/docs/concepts/storage/volume-pvc-datasource/) only available for CSI volume plugins.
Volume clone feature was added to support CSI Volume Plugins only. For details, see [volume cloning](/docs/concepts/storage/volume-pvc-datasource/).
### Create PersistentVolumeClaim from an existing PVC {#create-persistent-volume-claim-from-an-existing-pvc}
To enable support for cloning a volume from a PVC data source, enable the
`VolumePVCDataSource` feature gate on the apiserver and controller-manager.
### Create Persistent Volume Claim from an existing pvc
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
@ -732,5 +746,17 @@ and need persistent storage, it is recommended that you use the following patter
dynamic storage support (in which case the user should create a matching PV)
or the cluster has no storage system (in which case the user cannot deploy
config requiring PVCs).
{{% /capture %}}
{{% capture whatsnext %}}
* Learn more about [Creating a PersistentVolume](/docs/tasks/configure-pod-container/configure-persistent-volume-storage/#create-a-persistentvolume).
* Learn more about [Creating a PersistentVolumeClaim](/docs/tasks/configure-pod-container/configure-persistent-volume-storage/#create-a-persistentvolumeclaim).
* Read the [Persistent Storage design document](https://git.k8s.io/community/contributors/design-proposals/storage/persistent-storage.md).
### Reference
* [PersistentVolume](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#persistentvolume-v1-core)
* [PersistentVolumeSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#persistentvolumespec-v1-core)
* [PersistentVolumeClaim](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#persistentvolumeclaim-v1-core)
* [PersistentVolumeClaimSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#persistentvolumeclaimspec-v1-core)
{{% /capture %}}

View File

@ -21,7 +21,7 @@ with [volumes](/docs/concepts/storage/volumes/) and
## Introduction
A `StorageClass` provides a way for administrators to describe the "classes" of
A StorageClass provides a way for administrators to describe the "classes" of
storage they offer. Different classes might map to quality-of-service levels,
or to backup policies, or to arbitrary policies determined by the cluster
administrators. Kubernetes itself is unopinionated about what classes
@ -30,18 +30,18 @@ systems.
## The StorageClass Resource
Each `StorageClass` contains the fields `provisioner`, `parameters`, and
`reclaimPolicy`, which are used when a `PersistentVolume` belonging to the
Each StorageClass contains the fields `provisioner`, `parameters`, and
`reclaimPolicy`, which are used when a PersistentVolume belonging to the
class needs to be dynamically provisioned.
The name of a `StorageClass` object is significant, and is how users can
The name of a StorageClass object is significant, and is how users can
request a particular class. Administrators set the name and other parameters
of a class when first creating `StorageClass` objects, and the objects cannot
of a class when first creating StorageClass objects, and the objects cannot
be updated once they are created.
Administrators can specify a default `StorageClass` just for PVCs that don't
Administrators can specify a default StorageClass just for PVCs that don't
request any particular class to bind to: see the
[`PersistentVolumeClaim` section](/docs/concepts/storage/persistent-volumes/#class-1)
[PersistentVolumeClaim section](/docs/concepts/storage/persistent-volumes/#class-1)
for details.
```yaml
@ -61,7 +61,7 @@ volumeBindingMode: Immediate
### Provisioner
Storage classes have a provisioner that determines what volume plugin is used
Each StorageClass has a provisioner that determines what volume plugin is used
for provisioning PVs. This field must be specified.
| Volume Plugin | Internal Provisioner| Config Example |
@ -104,23 +104,23 @@ vendors provide their own external provisioner.
### Reclaim Policy
Persistent Volumes that are dynamically created by a storage class will have the
PersistentVolumes that are dynamically created by a StorageClass will have the
reclaim policy specified in the `reclaimPolicy` field of the class, which can be
either `Delete` or `Retain`. If no `reclaimPolicy` is specified when a
`StorageClass` object is created, it will default to `Delete`.
StorageClass object is created, it will default to `Delete`.
Persistent Volumes that are created manually and managed via a storage class will have
PersistentVolumes that are created manually and managed via a StorageClass will have
whatever reclaim policy they were assigned at creation.
### Allow Volume Expansion
{{< feature-state for_k8s_version="v1.11" state="beta" >}}
Persistent Volumes can be configured to be expandable. This feature when set to `true`,
PersistentVolumes can be configured to be expandable. This feature when set to `true`,
allows the users to resize the volume by editing the corresponding PVC object.
The following types of volumes support volume expansion, when the underlying
Storage Class has the field `allowVolumeExpansion` set to true.
StorageClass has the field `allowVolumeExpansion` set to true.
{{< table caption = "Table of Volume types and the version of Kubernetes they require" >}}
@ -146,7 +146,7 @@ You can only use the volume expansion feature to grow a Volume, not to shrink it
### Mount Options
Persistent Volumes that are dynamically created by a storage class will have the
PersistentVolumes that are dynamically created by a StorageClass will have the
mount options specified in the `mountOptions` field of the class.
If the volume plugin does not support mount options but mount options are
@ -219,7 +219,7 @@ allowedTopologies:
## Parameters
Storage classes have parameters that describe volumes belonging to the storage
Storage Classes have parameters that describe volumes belonging to the storage
class. Different parameters may be accepted depending on the `provisioner`. For
example, the value `io1`, for the parameter `type`, and the parameter
`iopsPerGB` are specific to EBS. When a parameter is omitted, some default is
@ -350,7 +350,7 @@ parameters:
contains user password to use when talking to Gluster REST service. These
parameters are optional, empty password will be used when both
`secretNamespace` and `secretName` are omitted. The provided secret must have
type `"kubernetes.io/glusterfs"`, e.g. created in this way:
type `"kubernetes.io/glusterfs"`, for example created in this way:
```
kubectl create secret generic heketi-secret \
@ -367,7 +367,7 @@ parameters:
`"8452344e2becec931ece4e33c4674e4e,42982310de6c63381718ccfa6d8cf397"`. This
is an optional parameter.
* `gidMin`, `gidMax` : The minimum and maximum value of GID range for the
storage class. A unique value (GID) in this range ( gidMin-gidMax ) will be
StorageClass. A unique value (GID) in this range ( gidMin-gidMax ) will be
used for dynamically provisioned volumes. These are optional values. If not
specified, the volume will be provisioned with a value between 2000-2147483647
which are defaults for gidMin and gidMax respectively.
@ -441,7 +441,7 @@ This internal provisioner of OpenStack is deprecated. Please use [the external c
```
`datastore`: The user can also specify the datastore in the StorageClass.
The volume will be created on the datastore specified in the storage class,
The volume will be created on the datastore specified in the StorageClass,
which in this case is `VSANDatastore`. This field is optional. If the
datastore is not specified, then the volume will be created on the datastore
specified in the vSphere config file used to initialize the vSphere Cloud
@ -514,7 +514,7 @@ parameters:
same as `adminId`.
* `userSecretName`: The name of Ceph Secret for `userId` to map RBD image. It
must exist in the same namespace as PVCs. This parameter is required.
The provided secret must have type "kubernetes.io/rbd", e.g. created in this
The provided secret must have type "kubernetes.io/rbd", for example created in this
way:
```shell
@ -561,7 +561,7 @@ parameters:
* `adminSecretName`: secret that holds information about the Quobyte user and
the password to authenticate against the API server. The provided secret
must have type "kubernetes.io/quobyte" and the keys `user` and `password`,
e.g. created in this way:
for example:
```shell
kubectl create secret generic quobyte-admin-secret \
@ -580,7 +580,7 @@ parameters:
### Azure Disk
#### Azure Unmanaged Disk Storage Class
#### Azure Unmanaged Disk storage class {#azure-unmanaged-disk-storage-class}
```yaml
apiVersion: storage.k8s.io/v1
@ -601,7 +601,7 @@ parameters:
ignored. If a storage account is not provided, a new storage account will be
created in the same resource group as the cluster.
#### New Azure Disk Storage Class (starting from v1.7.2)
#### Azure Disk storage class (starting from v1.7.2) {#azure-disk-storage-class}
```yaml
apiVersion: storage.k8s.io/v1

View File

@ -11,7 +11,6 @@ weight: 30
{{% capture overview %}}
{{< feature-state for_k8s_version="v1.16" state="beta" >}}
This document describes the concept of cloning existing CSI Volumes in Kubernetes. Familiarity with [Volumes](/docs/concepts/storage/volumes) is suggested.
{{% /capture %}}
@ -36,6 +35,7 @@ Users need to be aware of the following when using this feature:
* Cloning is only supported within the same Storage Class.
- Destination volume must be the same storage class as the source
- Default storage class can be used and storageClassName omitted in the spec
* Cloning can only be performed between two volumes that use the same VolumeMode setting (if you request a block mode volume, the source MUST also be block mode)
## Provisioning
@ -60,6 +60,10 @@ spec:
name: pvc-1
```
{{< note >}}
You must specify a capacity value for `spec.resources.requests.storage`, and the value you specify must be the same or larger than the capacity of the source volume.
{{< /note >}}
The result is a new PVC with the name `clone-of-pvc-1` that has the exact same content as the specified source `pvc-1`.
## Usage

View File

@ -29,7 +29,7 @@ A `VolumeSnapshotContent` is a snapshot taken from a volume in the cluster that
A `VolumeSnapshot` is a request for snapshot of a volume by a user. It is similar to a PersistentVolumeClaim.
`VolumeSnapshotClass` allows you to specify different attributes belonging to a `VolumeSnapshot`. These attibutes may differ among snapshots taken from the same volume on the storage system and therefore cannot be expressed by using the same `StorageClass` of a `PersistentVolumeClaim`.
`VolumeSnapshotClass` allows you to specify different attributes belonging to a `VolumeSnapshot`. These attributes may differ among snapshots taken from the same volume on the storage system and therefore cannot be expressed by using the same `StorageClass` of a `PersistentVolumeClaim`.
Users need to be aware of the following when using this feature:

View File

@ -605,6 +605,38 @@ spec:
type: Directory
```
{{< caution >}}
It should be noted that the `FileOrCreate` mode does not create the parent directory of the file. If the parent directory of the mounted file does not exist, the pod fails to start. To ensure that this mode works, you can try to mount directories and files separately, as shown below.
{{< /caution >}}
#### Example Pod FileOrCreate
```yaml
apiVersion: v1
kind: Pod
metadata:
name: test-webserver
spec:
containers:
- name: test-webserver
image: k8s.gcr.io/test-webserver:latest
volumeMounts:
- mountPath: /var/local/aaa
name: mydir
- mountPath: /var/local/aaa/1.txt
name: myfile
volumes:
- name: mydir
hostPath:
# Ensure the file directory is created.
path: /var/local/aaa
type: DirectoryOrCreate
- name: myfile
hostPath:
path: /var/local/aaa/1.txt
type: FileOrCreate
```
### iscsi {#iscsi}
An `iscsi` volume allows an existing iSCSI (SCSI over IP) volume to be mounted
@ -1302,19 +1334,13 @@ persistent volume:
#### CSI raw block volume support
{{< feature-state for_k8s_version="v1.14" state="beta" >}}
{{< feature-state for_k8s_version="v1.18" state="stable" >}}
Starting with version 1.11, CSI introduced support for raw block volumes, which
relies on the raw block volume feature that was introduced in a previous version of
Kubernetes. This feature will make it possible for vendors with external CSI drivers to
implement raw block volumes support in Kubernetes workloads.
Vendors with external CSI drivers can implement raw block volumes support
in Kubernetes workloads.
CSI block volume support is feature-gated, but enabled by default. The two
feature gates which must be enabled for this feature are `BlockVolume` and
`CSIBlockVolume`.
Learn how to
[setup your PV/PVC with raw block volume support](/docs/concepts/storage/persistent-volumes/#raw-block-volume-support).
You can [setup your PV/PVC with raw block volume support](/docs/concepts/storage/persistent-volumes/#raw-block-volume-support)
as usual, without any CSI specific changes.
#### CSI ephemeral volumes

View File

@ -18,11 +18,12 @@ One CronJob object is like one line of a _crontab_ (cron table) file. It runs a
on a given schedule, written in [Cron](https://en.wikipedia.org/wiki/Cron) format.
{{< note >}}
All **CronJob** `schedule:` times are based on the timezone of the master where the job is initiated.
All **CronJob** `schedule:` times are denoted in UTC.
{{< /note >}}
When creating the manifest for a CronJob resource, make sure the name you provide
is no longer than 52 characters. This is because the CronJob controller will automatically
is a valid [DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
The name must be no longer than 52 characters. This is because the CronJob controller will automatically
append 11 characters to the job name provided and there is a constraint that the
maximum length of a Job name is no more than 63 characters.

View File

@ -19,8 +19,8 @@ collected. Deleting a DaemonSet will clean up the Pods it created.
Some typical uses of a DaemonSet are:
- running a cluster storage daemon, such as `glusterd`, `ceph`, on each node.
- running a logs collection daemon on every node, such as `fluentd` or `logstash`.
- running a node monitoring daemon on every node, such as [Prometheus Node Exporter](https://github.com/prometheus/node_exporter), [Flowmill](https://github.com/Flowmill/flowmill-k8s/), [Sysdig Agent](https://docs.sysdig.com), `collectd`, [Dynatrace OneAgent](https://www.dynatrace.com/technologies/kubernetes-monitoring/), [AppDynamics Agent](https://docs.appdynamics.com/display/CLOUD/Container+Visibility+with+Kubernetes), [Datadog agent](https://docs.datadoghq.com/agent/kubernetes/daemonset_setup/), [New Relic agent](https://docs.newrelic.com/docs/integrations/kubernetes-integration/installation/kubernetes-installation-configuration), Ganglia `gmond` or [Instana Agent](https://www.instana.com/supported-integrations/kubernetes-monitoring/).
- running a logs collection daemon on every node, such as `fluentd` or `filebeat`.
- running a node monitoring daemon on every node, such as [Prometheus Node Exporter](https://github.com/prometheus/node_exporter), [Flowmill](https://github.com/Flowmill/flowmill-k8s/), [Sysdig Agent](https://docs.sysdig.com), `collectd`, [Dynatrace OneAgent](https://www.dynatrace.com/technologies/kubernetes-monitoring/), [AppDynamics Agent](https://docs.appdynamics.com/display/CLOUD/Container+Visibility+with+Kubernetes), [Datadog agent](https://docs.datadoghq.com/agent/kubernetes/daemonset_setup/), [New Relic agent](https://docs.newrelic.com/docs/integrations/kubernetes-integration/installation/kubernetes-installation-configuration), Ganglia `gmond`, [Instana Agent](https://www.instana.com/supported-integrations/kubernetes-monitoring/) or [Elastic Metricbeat](https://www.elastic.co/guide/en/beats/metricbeat/current/running-on-kubernetes.html).
In a simple case, one DaemonSet, covering all nodes, would be used for each type of daemon.
A more complex setup might use multiple DaemonSets for a single type of daemon, but with
@ -39,7 +39,8 @@ You can describe a DaemonSet in a YAML file. For example, the `daemonset.yaml` f
{{< codenew file="controllers/daemonset.yaml" >}}
* Create a DaemonSet based on the YAML file:
Create a DaemonSet based on the YAML file:
```
kubectl apply -f https://k8s.io/examples/controllers/daemonset.yaml
```
@ -50,6 +51,9 @@ As with all other Kubernetes config, a DaemonSet needs `apiVersion`, `kind`, and
general information about working with config files, see [deploying applications](/docs/user-guide/deploying-applications/),
[configuring containers](/docs/tasks/), and [object management using kubectl](/docs/concepts/overview/working-with-objects/object-management/) documents.
The name of a DaemonSet object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
A DaemonSet also needs a [`.spec`](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status) section.
### Pod Template

View File

@ -64,7 +64,7 @@ In this example:
* The Pods are labeled `app: nginx`using the `labels` field.
* The Pod template's specification, or `.template.spec` field, indicates that
the Pods run one container, `nginx`, which runs the `nginx`
[Docker Hub](https://hub.docker.com/) image at version 1.7.9.
[Docker Hub](https://hub.docker.com/) image at version 1.14.2.
* Create one container and name it `nginx` using the `name` field.
Follow the steps given below to create the above Deployment:
@ -153,15 +153,15 @@ is changed, for example if the labels or container images of the template are up
Follow the steps given below to update your Deployment:
1. Let's update the nginx Pods to use the `nginx:1.9.1` image instead of the `nginx:1.7.9` image.
1. Let's update the nginx Pods to use the `nginx:1.16.1` image instead of the `nginx:1.14.2` image.
```shell
kubectl --record deployment.apps/nginx-deployment set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1
kubectl --record deployment.apps/nginx-deployment set image deployment.v1.apps/nginx-deployment nginx=nginx:1.16.1
```
or simply use the following command:
```shell
kubectl set image deployment/nginx-deployment nginx=nginx:1.9.1 --record
kubectl set image deployment/nginx-deployment nginx=nginx:1.16.1 --record
```
The output is similar to this:
@ -169,7 +169,7 @@ Follow the steps given below to update your Deployment:
deployment.apps/nginx-deployment image updated
```
Alternatively, you can `edit` the Deployment and change `.spec.template.spec.containers[0].image` from `nginx:1.7.9` to `nginx:1.9.1`:
Alternatively, you can `edit` the Deployment and change `.spec.template.spec.containers[0].image` from `nginx:1.14.2` to `nginx:1.16.1`:
```shell
kubectl edit deployment.v1.apps/nginx-deployment
@ -265,7 +265,7 @@ up to 3 replicas, as well as scaling down the old ReplicaSet to 0 replicas.
Labels: app=nginx
Containers:
nginx:
Image: nginx:1.9.1
Image: nginx:1.16.1
Port: 80/TCP
Environment: <none>
Mounts: <none>
@ -306,11 +306,11 @@ If you update a Deployment while an existing rollout is in progress, the Deploym
as per the update and start scaling that up, and rolls over the ReplicaSet that it was scaling up previously
-- it will add it to its list of old ReplicaSets and start scaling it down.
For example, suppose you create a Deployment to create 5 replicas of `nginx:1.7.9`,
but then update the Deployment to create 5 replicas of `nginx:1.9.1`, when only 3
replicas of `nginx:1.7.9` had been created. In that case, the Deployment immediately starts
killing the 3 `nginx:1.7.9` Pods that it had created, and starts creating
`nginx:1.9.1` Pods. It does not wait for the 5 replicas of `nginx:1.7.9` to be created
For example, suppose you create a Deployment to create 5 replicas of `nginx:1.14.2`,
but then update the Deployment to create 5 replicas of `nginx:1.16.1`, when only 3
replicas of `nginx:1.14.2` had been created. In that case, the Deployment immediately starts
killing the 3 `nginx:1.14.2` Pods that it had created, and starts creating
`nginx:1.16.1` Pods. It does not wait for the 5 replicas of `nginx:1.14.2` to be created
before changing course.
### Label selector updates
@ -347,10 +347,10 @@ This means that when you roll back to an earlier revision, only the Deployment's
rolled back.
{{< /note >}}
* Suppose that you made a typo while updating the Deployment, by putting the image name as `nginx:1.91` instead of `nginx:1.9.1`:
* Suppose that you made a typo while updating the Deployment, by putting the image name as `nginx:1.161` instead of `nginx:1.16.1`:
```shell
kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.91 --record=true
kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.161 --record=true
```
The output is similar to this:
@ -427,7 +427,7 @@ rolled back.
Labels: app=nginx
Containers:
nginx:
Image: nginx:1.91
Image: nginx:1.161
Port: 80/TCP
Host Port: 0/TCP
Environment: <none>
@ -468,13 +468,13 @@ Follow the steps given below to check the rollout history:
deployments "nginx-deployment"
REVISION CHANGE-CAUSE
1 kubectl apply --filename=https://k8s.io/examples/controllers/nginx-deployment.yaml --record=true
2 kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1 --record=true
3 kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.91 --record=true
2 kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.16.1 --record=true
3 kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.161 --record=true
```
`CHANGE-CAUSE` is copied from the Deployment annotation `kubernetes.io/change-cause` to its revisions upon creation. You can specify the`CHANGE-CAUSE` message by:
* Annotating the Deployment with `kubectl annotate deployment.v1.apps/nginx-deployment kubernetes.io/change-cause="image updated to 1.9.1"`
* Annotating the Deployment with `kubectl annotate deployment.v1.apps/nginx-deployment kubernetes.io/change-cause="image updated to 1.16.1"`
* Append the `--record` flag to save the `kubectl` command that is making changes to the resource.
* Manually editing the manifest of the resource.
@ -488,10 +488,10 @@ Follow the steps given below to check the rollout history:
deployments "nginx-deployment" revision 2
Labels: app=nginx
pod-template-hash=1159050644
Annotations: kubernetes.io/change-cause=kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1 --record=true
Annotations: kubernetes.io/change-cause=kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.16.1 --record=true
Containers:
nginx:
Image: nginx:1.9.1
Image: nginx:1.16.1
Port: 80/TCP
QoS Tier:
cpu: BestEffort
@ -549,7 +549,7 @@ Follow the steps given below to rollback the Deployment from the current version
CreationTimestamp: Sun, 02 Sep 2018 18:17:55 -0500
Labels: app=nginx
Annotations: deployment.kubernetes.io/revision=4
kubernetes.io/change-cause=kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1 --record=true
kubernetes.io/change-cause=kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.16.1 --record=true
Selector: app=nginx
Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable
StrategyType: RollingUpdate
@ -559,7 +559,7 @@ Follow the steps given below to rollback the Deployment from the current version
Labels: app=nginx
Containers:
nginx:
Image: nginx:1.9.1
Image: nginx:1.16.1
Port: 80/TCP
Host Port: 0/TCP
Environment: <none>
@ -722,7 +722,7 @@ apply multiple fixes in between pausing and resuming without triggering unnecess
* Then update the image of the Deployment:
```shell
kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1
kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.16.1
```
The output is similar to this:
@ -1020,6 +1020,8 @@ can create multiple Deployments, one for each release, following the canary patt
As with all other Kubernetes configs, a Deployment needs `apiVersion`, `kind`, and `metadata` fields.
For general information about working with config files, see [deploying applications](/docs/tutorials/stateless-application/run-stateless-application-deployment/),
configuring containers, and [using kubectl to manage resources](/docs/concepts/overview/working-with-objects/object-management/) documents.
The name of a Deployment object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
A Deployment also needs a [`.spec` section](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status).
@ -1074,7 +1076,7 @@ All existing Pods are killed before new ones are created when `.spec.strategy.ty
#### Rolling Update Deployment
The Deployment updates Pods in a [rolling update](/docs/tasks/run-application/rolling-update-replication-controller/)
The Deployment updates Pods in a rolling update
fashion when `.spec.strategy.type==RollingUpdate`. You can specify `maxUnavailable` and `maxSurge` to control
the rolling update process.
@ -1141,12 +1143,4 @@ a paused Deployment and one that is not paused, is that any changes into the Pod
Deployment will not trigger new rollouts as long as it is paused. A Deployment is not paused by default when
it is created.
## Alternative to Deployments
### kubectl rolling-update
[`kubectl rolling-update`](/docs/reference/generated/kubectl/kubectl-commands#rolling-update) updates Pods and ReplicationControllers
in a similar fashion. But Deployments are recommended, since they are declarative, server side, and have
additional features, such as rolling back to any previous revision even after the rolling update is done.
{{% /capture %}}

Some files were not shown because too many files have changed in this diff Show More