Merge pull request #2892 from derekwaynecarr/clarify-docs

Clarify how memory.available is computed in eviction
pull/2947/head
devin-donnelly 2017-03-21 17:05:02 -07:00 committed by GitHub
commit 7fdb8ce53c
2 changed files with 66 additions and 15 deletions

View File

@ -9,23 +9,27 @@ title: Configuring Out Of Resource Handling
* TOC
{:toc}
The `kubelet` needs to preserve node stability when available compute resources are low.
The `kubelet` needs to preserve node stability when available compute resources
are low.
This is especially important when dealing with incompressible resources such as memory or disk.
This is especially important when dealing with incompressible resources such as
memory or disk.
If either resource is exhausted, the node would become unstable.
## Eviction Policy
The `kubelet` can pro-actively monitor for and prevent against total starvation of a compute resource. In those cases, the `kubelet` can pro-actively fail one or more pods in order to reclaim
the starved resource. When the `kubelet` fails a pod, it terminates all containers in the pod, and the `PodPhase`
is transitioned to `Failed`.
The `kubelet` can pro-actively monitor for and prevent against total starvation
of a compute resource. In those cases, the `kubelet` can pro-actively fail one
or more pods in order to reclaim the starved resource. When the `kubelet` fails
a pod, it terminates all containers in the pod, and the `PodPhase` is
transitioned to `Failed`.
### Eviction Signals
The `kubelet` can support the ability to trigger eviction decisions on the signals described in the
table below. The value of each signal is described in the description column based on the `kubelet`
summary API.
The `kubelet` can support the ability to trigger eviction decisions on the
signals described in the table below. The value of each signal is described in
the description column based on the `kubelet` summary API.
| Eviction Signal | Description |
|----------------------------|-----------------------------------------------------------------------|
@ -35,20 +39,36 @@ summary API.
| `imagefs.available` | `imagefs.available` := `node.stats.runtime.imagefs.available` |
| `imagefs.inodesFree` | `imagefs.inodesFree` := `node.stats.runtime.imagefs.inodesFree` |
Each of the above signals supports either a literal or percentage based value. The percentage based value
is calculated relative to the total capacity associated with each signal.
Each of the above signals supports either a literal or percentage based value.
The percentage based value is calculated relative to the total capacity
associated with each signal.
The value for `memory.available` is derived from the cgroupfs instead of tools
like `free -m`. This is important because `free -m` does not work in a
container, and if users use the [node
allocatable](/docs/admin/node-allocatable.md) feature, out of resource decisions
are made local to the end user pod part of the cgroup hierarchy as well as the
root node. This
[script](/docs/concepts/cluster-administration/out-of-resource/memory-available.sh)
reproduces the same set of steps that the `kubelet` performs to calculate
`memory.available`. The `kubelet` excludes inactive_file (i.e. # of bytes of
file-backed memory on inactive LRU list) from its calculation as it assumes that
memory is reclaimable under pressure.
`kubelet` supports only two filesystem partitions.
1. The `nodefs` filesystem that kubelet uses for volumes, daemon logs, etc.
1. The `imagefs` filesystem that container runtimes uses for storing images and container writable layers.
1. The `imagefs` filesystem that container runtimes uses for storing images and
container writable layers.
`imagefs` is optional. `kubelet` auto-discovers these filesystems using cAdvisor. `kubelet` does not care about any
other filesystems. Any other types of configurations are not currently supported by the kubelet. For example, it is
`imagefs` is optional. `kubelet` auto-discovers these filesystems using
cAdvisor. `kubelet` does not care about any other filesystems. Any other types
of configurations are not currently supported by the kubelet. For example, it is
*not OK* to store volumes and logs in a dedicated `filesystem`.
In future releases, the `kubelet` will deprecate the existing [garbage collection](/docs/admin/garbage-collection/)
support in favor of eviction in response to disk pressure.
In future releases, the `kubelet` will deprecate the existing [garbage
collection](/docs/admin/garbage-collection/) support in favor of eviction in
response to disk pressure.
### Eviction Thresholds

View File

@ -0,0 +1,31 @@
#!/bin/bash
#!/usr/bin/env bash
# This script reproduces what the kubelet does
# to calculate memory.available relative to root cgroup.
# current memory usage
memory_capacity_in_kb=$(cat /proc/meminfo | grep MemTotal | awk '{print $2}')
memory_capacity_in_bytes=$((memory_capacity_in_kb * 1024))
memory_usage_in_bytes=$(cat /sys/fs/cgroup/memory/memory.usage_in_bytes)
memory_total_inactive_file=$(cat /sys/fs/cgroup/memory/memory.stat | grep total_inactive_file | awk '{print $2}')
memory_working_set=$memory_usage_in_bytes
if [ "$memory_working_set" -lt "$memory_total_inactive_file" ];
then
memory_working_set=0
else
memory_working_set=$((memory_usage_in_bytes - memory_total_inactive_file))
fi
memory_available_in_bytes=$((memory_capacity_in_bytes - memory_working_set))
memory_available_in_kb=$((memory_available_in_bytes / 1024))
memory_available_in_mb=$((memory_available_in_kb / 1024))
echo "memory.capacity_in_bytes $memory_capacity_in_bytes"
echo "memory.usage_in_bytes $memory_usage_in_bytes"
echo "memory.total_inactive_file $memory_total_inactive_file"
echo "memory.working_set $memory_working_set"
echo "memory.available_in_bytes $memory_available_in_bytes"
echo "memory.available_in_kb $memory_available_in_kb"
echo "memory.available_in_mb $memory_available_in_mb"