diff --git a/_data/concepts.yml b/_data/concepts.yml index f165490883..bf6f8b53e8 100644 --- a/_data/concepts.yml +++ b/_data/concepts.yml @@ -43,7 +43,6 @@ toc: section: - docs/concepts/cluster-administration/network-plugins.md - docs/concepts/cluster-administration/device-plugins.md - - docs/concepts/cluster-administration/sysctl-cluster.md - docs/concepts/service-catalog/index.md - title: Containers diff --git a/_data/tasks.yml b/_data/tasks.yml index e9210213bc..d7c03e36ee 100644 --- a/_data/tasks.yml +++ b/_data/tasks.yml @@ -143,6 +143,7 @@ toc: - docs/tasks/administer-cluster/access-cluster-api.md - docs/tasks/administer-cluster/access-cluster-services.md - docs/tasks/administer-cluster/securing-a-cluster.md + - docs/tasks/administer-cluster/sysctl-cluster.md - docs/tasks/administer-cluster/encrypt-data.md - docs/tasks/administer-cluster/configure-upgrade-etcd.md - docs/tasks/administer-cluster/static-pod.md diff --git a/_redirects b/_redirects index 3d5c08fe03..48d26e3892 100644 --- a/_redirects +++ b/_redirects @@ -50,7 +50,7 @@ /docs/admin/resourcequota/limitstorageconsumption/ /docs/tasks/administer-cluster/limit-storage-consumption/ 301 /docs/admin/resourcequota/walkthrough/ /docs/tasks/administer-cluster/quota-api-object/ 301 /docs/admin/static-pods/ /docs/tasks/administer-cluster/static-pod/ 301 -/docs/admin/sysctls/ /docs/concepts/cluster-administration/sysctl-cluster/ 301 +/docs/admin/sysctls/ /docs/tasks/administer-cluster/sysctl-cluster/ 301 /docs/admin/upgrade-1-6/ /docs/tasks/administer-cluster/upgrade-1-6/ 301 /docs/admin/resource-quota/ /docs/concepts/policy/resource-quotas/ 301 @@ -97,6 +97,7 @@ /docs/concepts/cluster-administration/multiple-clusters/ /docs/concepts/cluster-administration/federation/ 301 /docs/concepts/cluster-administration/out-of-resource/ /docs/tasks/administer-cluster/out-of-resource/ 301 /docs/concepts/cluster-administration/resource-usage-monitoring /docs/tasks/debug-application-cluster/resource-usage-monitoring/ 301 +/docs/concepts/cluster-administration/sysctl-cluster/ /docs/tasks/administer-cluster/sysctl-cluster/ 301 /docs/concepts/cluster-administration/static-pod/ /docs/tasks/administer-cluster/static-pod/ 301 /docs/concepts/clusters/logging/ /docs/concepts/cluster-administration/logging/ 301 /docs/concepts/configuration/container-command-arg/ /docs/tasks/inject-data-application/define-command-argument-container/ 301 diff --git a/docs/concepts/cluster-administration/sysctl-cluster.md b/docs/tasks/administer-cluster/sysctl-cluster.md similarity index 81% rename from docs/concepts/cluster-administration/sysctl-cluster.md rename to docs/tasks/administer-cluster/sysctl-cluster.md index 796c735bb2..9c8a8fcc49 100644 --- a/docs/concepts/cluster-administration/sysctl-cluster.md +++ b/docs/tasks/administer-cluster/sysctl-cluster.md @@ -1,15 +1,24 @@ --- +title: Using Sysctls in a Kubernetes Cluster reviewers: - sttts -title: Using Sysctls in a Kubernetes Cluster --- -* TOC -{:toc} +{% capture overview %} This document describes how sysctls are used within a Kubernetes cluster. -## What is a Sysctl? +{% endcapture %} + +{% capture prerequisites %} + +{% include task-tutorial-prereqs.md %} + +{% endcapture %} + +{% capture steps %} + +## Listing all Sysctl Parameters In Linux, the sysctl interface allows an administrator to modify kernel parameters at runtime. Parameters are available via the `/proc/sys/` virtual @@ -23,11 +32,59 @@ process file system. The parameters cover various subsystems such as: To get a list of all parameters, you can run -``` +```shell $ sudo sysctl -a ``` -## Namespaced vs. Node-Level Sysctls +## Enabling Unsafe Sysctls + +Sysctls are grouped into _safe_ and _unsafe_ sysctls. In addition to proper +namespacing a _safe_ sysctl must be properly _isolated_ between pods on the same +node. This means that setting a _safe_ sysctl for one pod + +- must not have any influence on any other pod on the node +- must not allow to harm the node's health +- must not allow to gain CPU or memory resources outside of the resource limits + of a pod. + +By far, most of the _namespaced_ sysctls are not necessarily considered _safe_. +The following sysctls are supported in the _safe_ set: + +- `kernel.shm_rmid_forced`, +- `net.ipv4.ip_local_port_range`, +- `net.ipv4.tcp_syncookies`. + +**Note**: The example `net.ipv4.tcp_syncookies` is not namespaced on Linux kernel version 4.4 or lower. +{: .note} + +This list will be extended in future Kubernetes versions when the kubelet +supports better isolation mechanisms. + +All _safe_ sysctls are enabled by default. + +All _unsafe_ sysctls are disabled by default and must be allowed manually by the +cluster admin on a per-node basis. Pods with disabled unsafe sysctls will be +scheduled, but will fail to launch. + +With the warning above in mind, the cluster admin can allow certain _unsafe_ +sysctls for very special situations like e.g. high-performance or real-time +application tuning. _Unsafe_ sysctls are enabled on a node-by-node basis with a +flag of the kubelet, e.g.: + +```shell +$ kubelet --experimental-allowed-unsafe-sysctls \ + 'kernel.msg*,net.ipv4.route.min_pmtu' ... +``` + +For minikube, this can be done via the `extra-config` flag: + +```shell +$ minikube start --extra-config="kubelet.AllowedUnsafeSysctls=kernel.msg*,net.ipv4.route.min_pmtu"... +``` + +Only _namespaced_ sysctls can be enabled this way. + +## Setting Sysctls for a Pod A number of sysctls are _namespaced_ in today's Linux kernels. This means that they can be set independently for each pod on a node. Being namespaced is a @@ -46,67 +103,8 @@ manually by the cluster admin, either by means of the underlying Linux distribution of the nodes (e.g. via `/etc/sysctls.conf`) or using a DaemonSet with privileged containers. -**Note**: it is good practice to consider nodes with special sysctl settings as -_tainted_ within a cluster, and only schedule pods onto them which need those -sysctl settings. It is suggested to use the Kubernetes [_taints and toleration_ -feature](/docs/user-guide/kubectl/{{page.version}}/#taint) to implement this. - -## Safe vs. Unsafe Sysctls - -Sysctls are grouped into _safe_ and _unsafe_ sysctls. In addition to proper -namespacing a _safe_ sysctl must be properly _isolated_ between pods on the same -node. This means that setting a _safe_ sysctl for one pod - -- must not have any influence on any other pod on the node -- must not allow to harm the node's health -- must not allow to gain CPU or memory resources outside of the resource limits - of a pod. - -By far, most of the _namespaced_ sysctls are not necessarily considered _safe_. - -For Kubernetes 1.4, the following sysctls are supported in the _safe_ set: - -- `kernel.shm_rmid_forced`, -- `net.ipv4.ip_local_port_range`, -- `net.ipv4.tcp_syncookies`. - -**Note**: The example `net.ipv4.tcp_syncookies` is not namespaced on Linux kernel version 4.4 or lower. -{: .note} - -This list will be extended in future Kubernetes versions when the kubelet -supports better isolation mechanisms. - -All _safe_ sysctls are enabled by default. - -All _unsafe_ sysctls are disabled by default and must be allowed manually by the -cluster admin on a per-node basis. Pods with disabled unsafe sysctls will be -scheduled, but will fail to launch. - -**Warning**: Due to their nature of being _unsafe_, the use of _unsafe_ sysctls -is at-your-own-risk and can lead to severe problems like wrong behavior of -containers, resource shortage or complete breakage of a node. - -## Enabling Unsafe Sysctls - -With the warning above in mind, the cluster admin can allow certain _unsafe_ -sysctls for very special situations like e.g. high-performance or real-time -application tuning. _Unsafe_ sysctls are enabled on a node-by-node basis with a -flag of the kubelet, e.g.: - -```shell -$ kubelet --experimental-allowed-unsafe-sysctls 'kernel.msg*,net.ipv4.route.min_pmtu' ... -``` -For minikube, this can be done via the `extra-config` flag: - -```shell -$ minikube start --extra-config="kubelet.AllowedUnsafeSysctls=kernel.msg*,net.ipv4.route.min_pmtu"... -``` -Only _namespaced_ sysctls can be enabled this way. - -## Setting Sysctls for a Pod - -The sysctl feature is an alpha API in Kubernetes 1.4. Therefore, sysctls are set -using annotations on pods. They apply to all containers in the same pod. +The sysctl feature is an alpha API. Therefore, sysctls are set using annotations +on pods. They apply to all containers in the same pod. Here is an example, with different annotations for _safe_ and _unsafe_ sysctls: @@ -121,11 +119,25 @@ metadata: spec: ... ``` +{% endcapture %} -**Note**: a pod with the _unsafe_ sysctls specified above will fail to launch on -any node which has not enabled those two _unsafe_ sysctls explicitly. As with -_node-level_ sysctls it is recommended to use [_taints and toleration_ -feature](/docs/user-guide/kubectl/{{page.version}}/#taint) or [taints on nodes](/docs/concepts/configuration/taint-and-toleration/) +{% capture discussion %} + +**Warning**: Due to their nature of being _unsafe_, the use of _unsafe_ sysctls +is at-your-own-risk and can lead to severe problems like wrong behavior of +containers, resource shortage or complete breakage of a node. +{: .warning} + +It is good practice to consider nodes with special sysctl settings as +_tainted_ within a cluster, and only schedule pods onto them which need those +sysctl settings. It is suggested to use the Kubernetes [_taints and toleration_ +feature](/docs/user-guide/kubectl/{{page.version}}/#taint) to implement this. + +A pod with the _unsafe_ sysctls will fail to launch on any node which has not +enabled those two _unsafe_ sysctls explicitly. As with _node-level_ sysctls it +is recommended to use +[_taints and toleration_ feature](/docs/user-guide/kubectl/{{page.version}}/#taint) or +[taints on nodes](/docs/concepts/configuration/taint-and-toleration/) to schedule those pods onto the right nodes. ## PodSecurityPolicy Annotations @@ -148,3 +160,7 @@ metadata: spec: ... ``` + +{% endcapture %} + +{% include templates/task.md %}