clean up node problem detector task page

2020-10-28 21:48:51 -04:00 · 2020-10-28 21:48:51 -04:00 · d3f374c0d8
parent 34e8b55faf
commit d3f374c0d8
1 changed files with 81 additions and 100 deletions
--- a/content/en/docs/tasks/debug-application-cluster/monitor-node-health.md
+++ b/content/en/docs/tasks/debug-application-cluster/monitor-node-health.md
@ -1,133 +1,117 @@
 ---
+title: Monitor Node Health
+content_type: task
 reviewers:
 - Random-Liu
 - dchen1107
-content_type: task
-title: Monitor Node Health
 ---

 <!-- overview -->

-*Node problem detector* is a [DaemonSet](/docs/concepts/workloads/controllers/daemonset/) monitoring the
-node health. It collects node problems from various daemons and reports them
-to the apiserver as [NodeCondition](/docs/concepts/architecture/nodes/#condition)
+*Node problem detector* is a daemon for monitoring and reporting about a node's health.
+You can run node problem detector as a `DaemonSet`
+or as a standalone daemon. Node problem detector collects information about node problems from various daemons
+and reports these conditions to the API server as [NodeCondition](/docs/concepts/architecture/nodes/#condition)
 and [Event](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#event-v1-core).

-It supports some known kernel issue detection now, and will detect more and
-more node problems over time.
-
-Currently Kubernetes won't take any action on the node conditions and events
-generated by node problem detector. In the future, a remedy system could be
-introduced to deal with node problems.
-
-See more information
-[here](https://github.com/kubernetes/node-problem-detector).
-
-
+To learn how to install and use the node problem detector, see the
+[Node problem detector project documentation](https://github.com/kubernetes/node-problem-detector).

 ## {{% heading "prerequisites" %}}

-
-{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
-
-
+{{< include "task-tutorial-prereqs.md" >}}

 <!-- steps -->

 ## Limitations

-* The kernel issue detection of node problem detector only supports file based
-kernel log now. It doesn't support log tools like journald.
+* Node problem detector only supports file based kernel log.
+  Log tools such as `journald` are not supported.

-* The kernel issue detection of node problem detector has assumption on kernel
-log format, and now it only works on Ubuntu and Debian. However, it is easy to extend
-it to [support other log format](/docs/tasks/debug-application-cluster/monitor-node-health/#support-other-log-format).
+* Node problem detector uses the kernel log format for reporting kernel issues.
+  To learn how to extend the kernel log format, see [Add support for another log format](#support-other-log-format).

-## Enable/Disable in GCE cluster
+## Enabling node problem detector

-Node problem detector is [running as a cluster addon](/docs/setup/best-practices/cluster-large/#addon-resources) enabled by default in the
-gce cluster.
+Some cloud providers enable node problem detector as an {{< glossary_tooltip text="Addon" term_id="addons" >}}.
+You can also enable node problem detector with `kubectl` or by creating an Addon pod.

-You can enable/disable it by setting the environment variable
-`KUBE_ENABLE_NODE_PROBLEM_DETECTOR` before `kube-up.sh`.
+### Using kubectl to enable node problem detector {#using-kubectl}

-## Use in Other Environment
+`kubectl` provides the most flexible management of node problem detector.
+You can overwrite the default configuration to fit it into your environment or
+to detect customized node problems. For example:

-To enable node problem detector in other environment outside of GCE, you can use
-either `kubectl` or addon pod.
+1. Create a node problem detector configuration similar to `node-problem-detector.yaml`:

-### Kubectl
+   {{< codenew file="debug/node-problem-detector.yaml" >}}

-This is the recommended way to start node problem detector outside of GCE. It
-provides more flexible management, such as overwriting the default
-configuration to fit it into your environment or detect
-customized node problems.
+   {{< note >}}
+   You should verify that the system log directory is right for your operating system distribution.
+   {{< /note >}}

-* **Step 1:** `node-problem-detector.yaml`:
+1. Start node problem detector with `kubectl`:

-{{< codenew file="debug/node-problem-detector.yaml" >}}
+   ```shell
+   kubectl apply -f https://k8s.io/examples/debug/node-problem-detector.yaml
+   ```

+### Using an Addon pod to enable node problem detector {#using-addon-pod}

-***Notice that you should make sure the system log directory is right for your
-OS distro.***
-
-* **Step 2:** Start node problem detector with `kubectl`:
-
-```shell
- kubectl apply -f https://k8s.io/examples/debug/node-problem-detector.yaml
-```
-
-### Addon Pod
-
-This is for those who have their own cluster bootstrap solution, and don't need
-to overwrite the default configuration. They could leverage the addon pod to
+If you are using a custom cluster bootstrap solution and don't need
+to overwrite the default configuration, you can leverage the Addon pod to
 further automate the deployment.

-Just create `node-problem-detector.yaml`, and put it under the addon pods directory
-`/etc/kubernetes/addons/node-problem-detector` on master node.
+Create `node-problem-detector.yaml`, and save the configuration in the Addon pod's
+directory `/etc/kubernetes/addons/node-problem-detector` on a control plane node.

-## Overwrite the Configuration
+## Overwrite the configuration

 The [default configuration](https://github.com/kubernetes/node-problem-detector/tree/v0.1/config)
 is embedded when building the Docker image of node problem detector.

-However, you can use [ConfigMap](/docs/tasks/configure-pod-container/configure-pod-configmap/) to overwrite it
-following the steps:
+However, you can use a [`ConfigMap`](/docs/tasks/configure-pod-container/configure-pod-configmap/)
+to overwrite the configuration:

-* **Step 1:** Change the config files in `config/`.
-* **Step 2:** Create the ConfigMap `node-problem-detector-config` with `kubectl create configmap
-node-problem-detector-config --from-file=config/`.
-* **Step 3:** Change the `node-problem-detector.yaml` to use the ConfigMap:
+1. Change the configuration files in `config/`
+1. Create the `ConfigMap` `node-problem-detector-config`:

-{{< codenew file="debug/node-problem-detector-configmap.yaml" >}}
+   ```shell
+   kubectl create configmap node-problem-detector-config --from-file=config/
+   ```

+1. Change the `node-problem-detector.yaml` to use the `ConfigMap`:

-* **Step 4:** Re-create the node problem detector with the new yaml file:
+   {{< codenew file="debug/node-problem-detector-configmap.yaml" >}}

-```shell
- kubectl delete -f https://k8s.io/examples/debug/node-problem-detector.yaml # If you have a node-problem-detector running
- kubectl apply -f https://k8s.io/examples/debug/node-problem-detector-configmap.yaml
-```
+1. Recreate the node problem detector with the new configuration file:

-***Notice that this approach only applies to node problem detector started with `kubectl`.***
+   ```shell
+   # If you have a node-problem-detector running, delete before recreating
+   kubectl delete -f https://k8s.io/examples/debug/node-problem-detector.yaml
+   kubectl apply -f https://k8s.io/examples/debug/node-problem-detector-configmap.yaml
+   ```

-For node problem detector running as cluster addon, because addon manager doesn't support
-ConfigMap, configuration overwriting is not supported now.
+{{< note >}}
+This approach only applies to a node problem detector started with `kubectl`.
+{{< /note >}}
+
+Overwriting a configuration is not supported if a node problem detector runs as a cluster Addon.
+The Addon manager does not support `ConfigMap`.

 ## Kernel Monitor

-*Kernel Monitor* is a problem daemon in node problem detector. It monitors kernel log
-and detects known kernel issues following predefined rules.
+*Kernel Monitor* is a system log monitor daemon supported in the node problem detector.
+Kernel monitor watches the kernel log and detects known kernel issues following predefined rules.

 The Kernel Monitor matches kernel issues according to a set of predefined rule list in
-[`config/kernel-monitor.json`](https://github.com/kubernetes/node-problem-detector/blob/v0.1/config/kernel-monitor.json).
-The rule list is extensible, and you can always extend it by overwriting the
+[`config/kernel-monitor.json`](https://github.com/kubernetes/node-problem-detector/blob/v0.1/config/kernel-monitor.json). The rule list is extensible. You can extend the rule list by overwriting the
 configuration.

-### Add New NodeConditions
+### Add new NodeConditions

-To support new node conditions, you can extend the `conditions` field in
-`config/kernel-monitor.json` with new condition definition:
+To support a new `NodeCondition`, you can extend the `conditions` field in
+`config/kernel-monitor.json` with a new condition definition such as:

 ```json
 {
@ -137,10 +121,10 @@ To support new node conditions, you can extend the `conditions` field in
 }
 ```

-### Detect New Problems
+### Detect new problems

 To detect new problems, you can extend the `rules` field in `config/kernel-monitor.json`
-with new rule definition:
+with a new rule definition:

 ```json
 {
@ -151,31 +135,28 @@ with new rule definition:
 }
 ```

-### Change Log Path
+### Configure path for the kernel log device {#kernel-log-device-path}

-Kernel log in different OS distros may locate in different path. The `log`
-field in `config/kernel-monitor.json` is the log path inside the container.
-You can always configure it to match your OS distro.
-
-### Support Other Log Format
-
-Kernel monitor uses [`Translator`](https://github.com/kubernetes/node-problem-detector/blob/v0.1/pkg/kernelmonitor/translator/translator.go)
-plugin to translate kernel log the internal data structure. It is easy to
-implement a new translator for a new log format.
+Check your kernel log path location in your operating system (OS) distribution.
+The Linux kernel [log device](https://www.kernel.org/doc/Documentation/ABI/testing/dev-kmsg) is usually presented as `/dev/kmsg`. However, the log path location varies by OS distribution.
+The `log` field in `config/kernel-monitor.json` represents the log path inside the container.
+You can configure the `log` field to match the device path as seen by the node problem detector.

+### Add support for another log format {#support-other-log-format}

+Kernel monitor uses the
+[`Translator`](https://github.com/kubernetes/node-problem-detector/blob/v0.1/pkg/kernelmonitor/translator/translator.go) plugin to translate the internal data structure of the kernel log.
+You can implement a new translator for a new log format.

 <!-- discussion -->

-## Caveats
-
-It is recommended to run the node problem detector in your cluster to monitor
-the node health. However, you should be aware that this will introduce extra
-resource overhead on each node. Usually this is fine, because:
-
-* The kernel log is generated relatively slowly.
-* Resource limit is set for node problem detector.
-* Even under high load, the resource usage is acceptable.
-(see [benchmark result](https://github.com/kubernetes/node-problem-detector/issues/2#issuecomment-220255629))
+## Recommendations and restrictions

+It is recommended to run the node problem detector in your cluster to monitor node health.
+When running the node problem detector, you can expect extra resource overhead on each node.
+Usually this is fine, because:

+* The kernel log grows relatively slowly.
+* A resource limit is set for the node problem detector.
+* Even under high load, the resource usage is acceptable. For more information, see the node problem detector
+[benchmark result](https://github.com/kubernetes/node-problem-detector/issues/2#issuecomment-220255629).