clean up node problem detector task page

2020-10-28 21:48:51 -04:00 · 2020-10-28 21:48:51 -04:00 · d3f374c0d8
parent 34e8b55faf
commit d3f374c0d8
1 changed files with 81 additions and 100 deletions
--- a/content/en/docs/tasks/debug-application-cluster/monitor-node-health.md
+++ b/content/en/docs/tasks/debug-application-cluster/monitor-node-health.md
@ -1,133 +1,117 @@
 ---
 title: Monitor Node Health
 content_type: task
 reviewers:
 - Random-Liu
 - dchen1107
 content_type: task
 title: Monitor Node Health
 ---
 <!-- overview -->
-*Node problem detector* is a [DaemonSet](/docs/concepts/workloads/controllers/daemonset/) monitoring the
+*Node problem detector* is a daemon for monitoring and reporting about a node's health.
-node health. It collects node problems from various daemons and reports them
+You can run node problem detector as a `DaemonSet`
-to the apiserver as [NodeCondition](/docs/concepts/architecture/nodes/#condition)
+or as a standalone daemon. Node problem detector collects information about node problems from various daemons
 and reports these conditions to the API server as [NodeCondition](/docs/concepts/architecture/nodes/#condition)
 and [Event](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#event-v1-core).
-It supports some known kernel issue detection now, and will detect more and
+To learn how to install and use the node problem detector, see the
-more node problems over time.
+[Node problem detector project documentation](https://github.com/kubernetes/node-problem-detector).
 Currently Kubernetes won't take any action on the node conditions and events
 generated by node problem detector. In the future, a remedy system could be
 introduced to deal with node problems.
 See more information
 [here](https://github.com/kubernetes/node-problem-detector).
 ## {{% heading "prerequisites" %}}
-
+{{< include "task-tutorial-prereqs.md" >}}
 {{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
 <!-- steps -->
 ## Limitations
-* The kernel issue detection of node problem detector only supports file based
+* Node problem detector only supports file based kernel log.
-kernel log now. It doesn't support log tools like journald.
+  Log tools such as `journald` are not supported.
-* The kernel issue detection of node problem detector has assumption on kernel
+* Node problem detector uses the kernel log format for reporting kernel issues.
-log format, and now it only works on Ubuntu and Debian. However, it is easy to extend
+  To learn how to extend the kernel log format, see [Add support for another log format](#support-other-log-format).
 it to [support other log format](/docs/tasks/debug-application-cluster/monitor-node-health/#support-other-log-format).
-## Enable/Disable in GCE cluster
+## Enabling node problem detector
-Node problem detector is [running as a cluster addon](/docs/setup/best-practices/cluster-large/#addon-resources) enabled by default in the
+Some cloud providers enable node problem detector as an {{< glossary_tooltip text="Addon" term_id="addons" >}}.
-gce cluster.
+You can also enable node problem detector with `kubectl` or by creating an Addon pod.
-You can enable/disable it by setting the environment variable
+### Using kubectl to enable node problem detector {#using-kubectl}
 `KUBE_ENABLE_NODE_PROBLEM_DETECTOR` before `kube-up.sh`.
-## Use in Other Environment
+`kubectl` provides the most flexible management of node problem detector.
 You can overwrite the default configuration to fit it into your environment or
 to detect customized node problems. For example:
-To enable node problem detector in other environment outside of GCE, you can use
+1. Create a node problem detector configuration similar to `node-problem-detector.yaml`:
 either `kubectl` or addon pod.
 ### Kubectl
 This is the recommended way to start node problem detector outside of GCE. It
 provides more flexible management, such as overwriting the default
 configuration to fit it into your environment or detect
 customized node problems.
 * **Step 1:** `node-problem-detector.yaml`:
   {{< codenew file="debug/node-problem-detector.yaml" >}}
   {{< note >}}
   You should verify that the system log directory is right for your operating system distribution.
   {{< /note >}}
-***Notice that you should make sure the system log directory is right for your
+1. Start node problem detector with `kubectl`:
 OS distro.***
 * **Step 2:** Start node problem detector with `kubectl`:
   ```shell
   kubectl apply -f https://k8s.io/examples/debug/node-problem-detector.yaml
   ```
-### Addon Pod
+### Using an Addon pod to enable node problem detector {#using-addon-pod}
-This is for those who have their own cluster bootstrap solution, and don't need
+If you are using a custom cluster bootstrap solution and don't need
-to overwrite the default configuration. They could leverage the addon pod to
+to overwrite the default configuration, you can leverage the Addon pod to
 further automate the deployment.
-Just create `node-problem-detector.yaml`, and put it under the addon pods directory
+Create `node-problem-detector.yaml`, and save the configuration in the Addon pod's
-`/etc/kubernetes/addons/node-problem-detector` on master node.
+directory `/etc/kubernetes/addons/node-problem-detector` on a control plane node.
-## Overwrite the Configuration
+## Overwrite the configuration
 The [default configuration](https://github.com/kubernetes/node-problem-detector/tree/v0.1/config)
 is embedded when building the Docker image of node problem detector.
-However, you can use [ConfigMap](/docs/tasks/configure-pod-container/configure-pod-configmap/) to overwrite it
+However, you can use a [`ConfigMap`](/docs/tasks/configure-pod-container/configure-pod-configmap/)
-following the steps:
+to overwrite the configuration:
-* **Step 1:** Change the config files in `config/`.
+1. Change the configuration files in `config/`
-* **Step 2:** Create the ConfigMap `node-problem-detector-config` with `kubectl create configmap
+1. Create the `ConfigMap` `node-problem-detector-config`:
-node-problem-detector-config --from-file=config/`.
+
-* **Step 3:** Change the `node-problem-detector.yaml` to use the ConfigMap:
+   ```shell
   kubectl create configmap node-problem-detector-config --from-file=config/
   ```
 1. Change the `node-problem-detector.yaml` to use the `ConfigMap`:
   {{< codenew file="debug/node-problem-detector-configmap.yaml" >}}
-
+1. Recreate the node problem detector with the new configuration file:
 * **Step 4:** Re-create the node problem detector with the new yaml file:
   ```shell
- kubectl delete -f https://k8s.io/examples/debug/node-problem-detector.yaml # If you have a node-problem-detector running
+   # If you have a node-problem-detector running, delete before recreating
   kubectl delete -f https://k8s.io/examples/debug/node-problem-detector.yaml
   kubectl apply -f https://k8s.io/examples/debug/node-problem-detector-configmap.yaml
   ```
-***Notice that this approach only applies to node problem detector started with `kubectl`.***
+{{< note >}}
 This approach only applies to a node problem detector started with `kubectl`.
 {{< /note >}}
-For node problem detector running as cluster addon, because addon manager doesn't support
+Overwriting a configuration is not supported if a node problem detector runs as a cluster Addon.
-ConfigMap, configuration overwriting is not supported now.
+The Addon manager does not support `ConfigMap`.
 ## Kernel Monitor
-*Kernel Monitor* is a problem daemon in node problem detector. It monitors kernel log
+*Kernel Monitor* is a system log monitor daemon supported in the node problem detector.
-and detects known kernel issues following predefined rules.
+Kernel monitor watches the kernel log and detects known kernel issues following predefined rules.
 The Kernel Monitor matches kernel issues according to a set of predefined rule list in
-[`config/kernel-monitor.json`](https://github.com/kubernetes/node-problem-detector/blob/v0.1/config/kernel-monitor.json).
+[`config/kernel-monitor.json`](https://github.com/kubernetes/node-problem-detector/blob/v0.1/config/kernel-monitor.json). The rule list is extensible. You can extend the rule list by overwriting the
 The rule list is extensible, and you can always extend it by overwriting the
 configuration.
-### Add New NodeConditions
+### Add new NodeConditions
-To support new node conditions, you can extend the `conditions` field in
+To support a new `NodeCondition`, you can extend the `conditions` field in
-`config/kernel-monitor.json` with new condition definition:
+`config/kernel-monitor.json` with a new condition definition such as:
 ```json
 {
@ -137,10 +121,10 @@ To support new node conditions, you can extend the `conditions` field in
 }
 ```
-### Detect New Problems
+### Detect new problems
 To detect new problems, you can extend the `rules` field in `config/kernel-monitor.json`
-with new rule definition:
+with a new rule definition:
 ```json
 {
@ -151,31 +135,28 @@ with new rule definition:
 }
 ```
-### Change Log Path
+### Configure path for the kernel log device {#kernel-log-device-path}
-Kernel log in different OS distros may locate in different path. The `log`
+Check your kernel log path location in your operating system (OS) distribution.
-field in `config/kernel-monitor.json` is the log path inside the container.
+The Linux kernel [log device](https://www.kernel.org/doc/Documentation/ABI/testing/dev-kmsg) is usually presented as `/dev/kmsg`. However, the log path location varies by OS distribution.
-You can always configure it to match your OS distro.
+The `log` field in `config/kernel-monitor.json` represents the log path inside the container.
-
+You can configure the `log` field to match the device path as seen by the node problem detector.
 ### Support Other Log Format
 Kernel monitor uses [`Translator`](https://github.com/kubernetes/node-problem-detector/blob/v0.1/pkg/kernelmonitor/translator/translator.go)
 plugin to translate kernel log the internal data structure. It is easy to
 implement a new translator for a new log format.
 ### Add support for another log format {#support-other-log-format}
 Kernel monitor uses the
 [`Translator`](https://github.com/kubernetes/node-problem-detector/blob/v0.1/pkg/kernelmonitor/translator/translator.go) plugin to translate the internal data structure of the kernel log.
 You can implement a new translator for a new log format.
 <!-- discussion -->
-## Caveats
+## Recommendations and restrictions
 It is recommended to run the node problem detector in your cluster to monitor
 the node health. However, you should be aware that this will introduce extra
 resource overhead on each node. Usually this is fine, because:
 * The kernel log is generated relatively slowly.
 * Resource limit is set for node problem detector.
 * Even under high load, the resource usage is acceptable.
 (see [benchmark result](https://github.com/kubernetes/node-problem-detector/issues/2#issuecomment-220255629))
 It is recommended to run the node problem detector in your cluster to monitor node health.
 When running the node problem detector, you can expect extra resource overhead on each node.
 Usually this is fine, because:
 * The kernel log grows relatively slowly.
 * A resource limit is set for the node problem detector.
 * Even under high load, the resource usage is acceptable. For more information, see the node problem detector
 [benchmark result](https://github.com/kubernetes/node-problem-detector/issues/2#issuecomment-220255629).