Make using sysctls a task instead of a concept (#6808)

Closes: #4505
pull/7812/head
Qiming 2018-03-22 09:28:04 +08:00 committed by k8s-ci-robot
parent e9c79c6ff8
commit 60b5600157
4 changed files with 90 additions and 73 deletions

View File

@ -43,7 +43,6 @@ toc:
section: section:
- docs/concepts/cluster-administration/network-plugins.md - docs/concepts/cluster-administration/network-plugins.md
- docs/concepts/cluster-administration/device-plugins.md - docs/concepts/cluster-administration/device-plugins.md
- docs/concepts/cluster-administration/sysctl-cluster.md
- docs/concepts/service-catalog/index.md - docs/concepts/service-catalog/index.md
- title: Containers - title: Containers

View File

@ -143,6 +143,7 @@ toc:
- docs/tasks/administer-cluster/access-cluster-api.md - docs/tasks/administer-cluster/access-cluster-api.md
- docs/tasks/administer-cluster/access-cluster-services.md - docs/tasks/administer-cluster/access-cluster-services.md
- docs/tasks/administer-cluster/securing-a-cluster.md - docs/tasks/administer-cluster/securing-a-cluster.md
- docs/tasks/administer-cluster/sysctl-cluster.md
- docs/tasks/administer-cluster/encrypt-data.md - docs/tasks/administer-cluster/encrypt-data.md
- docs/tasks/administer-cluster/configure-upgrade-etcd.md - docs/tasks/administer-cluster/configure-upgrade-etcd.md
- docs/tasks/administer-cluster/static-pod.md - docs/tasks/administer-cluster/static-pod.md

View File

@ -50,7 +50,7 @@
/docs/admin/resourcequota/limitstorageconsumption/ /docs/tasks/administer-cluster/limit-storage-consumption/ 301 /docs/admin/resourcequota/limitstorageconsumption/ /docs/tasks/administer-cluster/limit-storage-consumption/ 301
/docs/admin/resourcequota/walkthrough/ /docs/tasks/administer-cluster/quota-api-object/ 301 /docs/admin/resourcequota/walkthrough/ /docs/tasks/administer-cluster/quota-api-object/ 301
/docs/admin/static-pods/ /docs/tasks/administer-cluster/static-pod/ 301 /docs/admin/static-pods/ /docs/tasks/administer-cluster/static-pod/ 301
/docs/admin/sysctls/ /docs/concepts/cluster-administration/sysctl-cluster/ 301 /docs/admin/sysctls/ /docs/tasks/administer-cluster/sysctl-cluster/ 301
/docs/admin/upgrade-1-6/ /docs/tasks/administer-cluster/upgrade-1-6/ 301 /docs/admin/upgrade-1-6/ /docs/tasks/administer-cluster/upgrade-1-6/ 301
/docs/admin/resource-quota/ /docs/concepts/policy/resource-quotas/ 301 /docs/admin/resource-quota/ /docs/concepts/policy/resource-quotas/ 301
@ -97,6 +97,7 @@
/docs/concepts/cluster-administration/multiple-clusters/ /docs/concepts/cluster-administration/federation/ 301 /docs/concepts/cluster-administration/multiple-clusters/ /docs/concepts/cluster-administration/federation/ 301
/docs/concepts/cluster-administration/out-of-resource/ /docs/tasks/administer-cluster/out-of-resource/ 301 /docs/concepts/cluster-administration/out-of-resource/ /docs/tasks/administer-cluster/out-of-resource/ 301
/docs/concepts/cluster-administration/resource-usage-monitoring /docs/tasks/debug-application-cluster/resource-usage-monitoring/ 301 /docs/concepts/cluster-administration/resource-usage-monitoring /docs/tasks/debug-application-cluster/resource-usage-monitoring/ 301
/docs/concepts/cluster-administration/sysctl-cluster/ /docs/tasks/administer-cluster/sysctl-cluster/ 301
/docs/concepts/cluster-administration/static-pod/ /docs/tasks/administer-cluster/static-pod/ 301 /docs/concepts/cluster-administration/static-pod/ /docs/tasks/administer-cluster/static-pod/ 301
/docs/concepts/clusters/logging/ /docs/concepts/cluster-administration/logging/ 301 /docs/concepts/clusters/logging/ /docs/concepts/cluster-administration/logging/ 301
/docs/concepts/configuration/container-command-arg/ /docs/tasks/inject-data-application/define-command-argument-container/ 301 /docs/concepts/configuration/container-command-arg/ /docs/tasks/inject-data-application/define-command-argument-container/ 301

View File

@ -1,15 +1,24 @@
--- ---
title: Using Sysctls in a Kubernetes Cluster
reviewers: reviewers:
- sttts - sttts
title: Using Sysctls in a Kubernetes Cluster
--- ---
* TOC {% capture overview %}
{:toc}
This document describes how sysctls are used within a Kubernetes cluster. This document describes how sysctls are used within a Kubernetes cluster.
## What is a Sysctl? {% endcapture %}
{% capture prerequisites %}
{% include task-tutorial-prereqs.md %}
{% endcapture %}
{% capture steps %}
## Listing all Sysctl Parameters
In Linux, the sysctl interface allows an administrator to modify kernel In Linux, the sysctl interface allows an administrator to modify kernel
parameters at runtime. Parameters are available via the `/proc/sys/` virtual parameters at runtime. Parameters are available via the `/proc/sys/` virtual
@ -23,11 +32,59 @@ process file system. The parameters cover various subsystems such as:
To get a list of all parameters, you can run To get a list of all parameters, you can run
``` ```shell
$ sudo sysctl -a $ sudo sysctl -a
``` ```
## Namespaced vs. Node-Level Sysctls ## Enabling Unsafe Sysctls
Sysctls are grouped into _safe_ and _unsafe_ sysctls. In addition to proper
namespacing a _safe_ sysctl must be properly _isolated_ between pods on the same
node. This means that setting a _safe_ sysctl for one pod
- must not have any influence on any other pod on the node
- must not allow to harm the node's health
- must not allow to gain CPU or memory resources outside of the resource limits
of a pod.
By far, most of the _namespaced_ sysctls are not necessarily considered _safe_.
The following sysctls are supported in the _safe_ set:
- `kernel.shm_rmid_forced`,
- `net.ipv4.ip_local_port_range`,
- `net.ipv4.tcp_syncookies`.
**Note**: The example `net.ipv4.tcp_syncookies` is not namespaced on Linux kernel version 4.4 or lower.
{: .note}
This list will be extended in future Kubernetes versions when the kubelet
supports better isolation mechanisms.
All _safe_ sysctls are enabled by default.
All _unsafe_ sysctls are disabled by default and must be allowed manually by the
cluster admin on a per-node basis. Pods with disabled unsafe sysctls will be
scheduled, but will fail to launch.
With the warning above in mind, the cluster admin can allow certain _unsafe_
sysctls for very special situations like e.g. high-performance or real-time
application tuning. _Unsafe_ sysctls are enabled on a node-by-node basis with a
flag of the kubelet, e.g.:
```shell
$ kubelet --experimental-allowed-unsafe-sysctls \
'kernel.msg*,net.ipv4.route.min_pmtu' ...
```
For minikube, this can be done via the `extra-config` flag:
```shell
$ minikube start --extra-config="kubelet.AllowedUnsafeSysctls=kernel.msg*,net.ipv4.route.min_pmtu"...
```
Only _namespaced_ sysctls can be enabled this way.
## Setting Sysctls for a Pod
A number of sysctls are _namespaced_ in today's Linux kernels. This means that A number of sysctls are _namespaced_ in today's Linux kernels. This means that
they can be set independently for each pod on a node. Being namespaced is a they can be set independently for each pod on a node. Being namespaced is a
@ -46,67 +103,8 @@ manually by the cluster admin, either by means of the underlying Linux
distribution of the nodes (e.g. via `/etc/sysctls.conf`) or using a DaemonSet distribution of the nodes (e.g. via `/etc/sysctls.conf`) or using a DaemonSet
with privileged containers. with privileged containers.
**Note**: it is good practice to consider nodes with special sysctl settings as The sysctl feature is an alpha API. Therefore, sysctls are set using annotations
_tainted_ within a cluster, and only schedule pods onto them which need those on pods. They apply to all containers in the same pod.
sysctl settings. It is suggested to use the Kubernetes [_taints and toleration_
feature](/docs/user-guide/kubectl/{{page.version}}/#taint) to implement this.
## Safe vs. Unsafe Sysctls
Sysctls are grouped into _safe_ and _unsafe_ sysctls. In addition to proper
namespacing a _safe_ sysctl must be properly _isolated_ between pods on the same
node. This means that setting a _safe_ sysctl for one pod
- must not have any influence on any other pod on the node
- must not allow to harm the node's health
- must not allow to gain CPU or memory resources outside of the resource limits
of a pod.
By far, most of the _namespaced_ sysctls are not necessarily considered _safe_.
For Kubernetes 1.4, the following sysctls are supported in the _safe_ set:
- `kernel.shm_rmid_forced`,
- `net.ipv4.ip_local_port_range`,
- `net.ipv4.tcp_syncookies`.
**Note**: The example `net.ipv4.tcp_syncookies` is not namespaced on Linux kernel version 4.4 or lower.
{: .note}
This list will be extended in future Kubernetes versions when the kubelet
supports better isolation mechanisms.
All _safe_ sysctls are enabled by default.
All _unsafe_ sysctls are disabled by default and must be allowed manually by the
cluster admin on a per-node basis. Pods with disabled unsafe sysctls will be
scheduled, but will fail to launch.
**Warning**: Due to their nature of being _unsafe_, the use of _unsafe_ sysctls
is at-your-own-risk and can lead to severe problems like wrong behavior of
containers, resource shortage or complete breakage of a node.
## Enabling Unsafe Sysctls
With the warning above in mind, the cluster admin can allow certain _unsafe_
sysctls for very special situations like e.g. high-performance or real-time
application tuning. _Unsafe_ sysctls are enabled on a node-by-node basis with a
flag of the kubelet, e.g.:
```shell
$ kubelet --experimental-allowed-unsafe-sysctls 'kernel.msg*,net.ipv4.route.min_pmtu' ...
```
For minikube, this can be done via the `extra-config` flag:
```shell
$ minikube start --extra-config="kubelet.AllowedUnsafeSysctls=kernel.msg*,net.ipv4.route.min_pmtu"...
```
Only _namespaced_ sysctls can be enabled this way.
## Setting Sysctls for a Pod
The sysctl feature is an alpha API in Kubernetes 1.4. Therefore, sysctls are set
using annotations on pods. They apply to all containers in the same pod.
Here is an example, with different annotations for _safe_ and _unsafe_ sysctls: Here is an example, with different annotations for _safe_ and _unsafe_ sysctls:
@ -121,11 +119,25 @@ metadata:
spec: spec:
... ...
``` ```
{% endcapture %}
**Note**: a pod with the _unsafe_ sysctls specified above will fail to launch on {% capture discussion %}
any node which has not enabled those two _unsafe_ sysctls explicitly. As with
_node-level_ sysctls it is recommended to use [_taints and toleration_ **Warning**: Due to their nature of being _unsafe_, the use of _unsafe_ sysctls
feature](/docs/user-guide/kubectl/{{page.version}}/#taint) or [taints on nodes](/docs/concepts/configuration/taint-and-toleration/) is at-your-own-risk and can lead to severe problems like wrong behavior of
containers, resource shortage or complete breakage of a node.
{: .warning}
It is good practice to consider nodes with special sysctl settings as
_tainted_ within a cluster, and only schedule pods onto them which need those
sysctl settings. It is suggested to use the Kubernetes [_taints and toleration_
feature](/docs/user-guide/kubectl/{{page.version}}/#taint) to implement this.
A pod with the _unsafe_ sysctls will fail to launch on any node which has not
enabled those two _unsafe_ sysctls explicitly. As with _node-level_ sysctls it
is recommended to use
[_taints and toleration_ feature](/docs/user-guide/kubectl/{{page.version}}/#taint) or
[taints on nodes](/docs/concepts/configuration/taint-and-toleration/)
to schedule those pods onto the right nodes. to schedule those pods onto the right nodes.
## PodSecurityPolicy Annotations ## PodSecurityPolicy Annotations
@ -148,3 +160,7 @@ metadata:
spec: spec:
... ...
``` ```
{% endcapture %}
{% include templates/task.md %}