website/content/en/docs/tasks/debug-application-cluster/monitor-node-health.md

181 lines
6.1 KiB
Markdown
Raw Normal View History

---
reviewers:
- Random-Liu
- dchen1107
content_template: templates/task
title: Monitor Node Health
---
{{% capture overview %}}
*Node problem detector* is a [DaemonSet](/docs/concepts/workloads/controllers/daemonset/) monitoring the
node health. It collects node problems from various daemons and reports them
to the apiserver as [NodeCondition](/docs/concepts/architecture/nodes/#condition)
and [Event](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#event-v1-core).
It supports some known kernel issue detection now, and will detect more and
more node problems over time.
Currently Kubernetes won't take any action on the node conditions and events
generated by node problem detector. In the future, a remedy system could be
introduced to deal with node problems.
See more information
[here](https://github.com/kubernetes/node-problem-detector).
{{% /capture %}}
{{% capture prerequisites %}}
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
{{% /capture %}}
{{% capture steps %}}
## Limitations
* The kernel issue detection of node problem detector only supports file based
kernel log now. It doesn't support log tools like journald.
* The kernel issue detection of node problem detector has assumption on kernel
log format, and now it only works on Ubuntu and Debian. However, it is easy to extend
it to [support other log format](/docs/tasks/debug-application-cluster/monitor-node-health/#support-other-log-format).
## Enable/Disable in GCE cluster
Node problem detector is [running as a cluster addon](/docs/setup/best-practices/cluster-large/#addon-resources) enabled by default in the
gce cluster.
You can enable/disable it by setting the environment variable
`KUBE_ENABLE_NODE_PROBLEM_DETECTOR` before `kube-up.sh`.
## Use in Other Environment
To enable node problem detector in other environment outside of GCE, you can use
either `kubectl` or addon pod.
### Kubectl
This is the recommended way to start node problem detector outside of GCE. It
provides more flexible management, such as overwriting the default
configuration to fit it into your environment or detect
customized node problems.
* **Step 1:** `node-problem-detector.yaml`:
{{< codenew file="debug/node-problem-detector.yaml" >}}
***Notice that you should make sure the system log directory is right for your
OS distro.***
* **Step 2:** Start node problem detector with `kubectl`:
```shell
Official 1.14 Release Docs (#13174) * Official documentation on Poseidon/Firmament, a new multi-scheduler support for K8S. (#11752) * Added documentation about Poseidon-Firmament scheduler * Fixed some style issues. * Udpated the document as per the review comments. * Fixed some typos and updated the document * Updated the document as per the review comments. * Document timeout attribute for kms-plugin. (#12158) See 72540. * Official documentation on Poseidon/Firmament, a new multi-scheduler (#12343) * Removed the old version of the Poseidon documentation. Incorrect location. * Official documentation on Poseidon/Firmament, a new multi-scheduler support for K8S (#12069) * Official documentation on Poseidon/Firmament, a new multi-scheduler support for K8S. (#11752) * Added documentation about Poseidon-Firmament scheduler * Fixed some style issues. * Udpated the document as per the review comments. * Fixed some typos and updated the document * Updated the document as per the review comments. * Updated the document as per review comments. Added config details. * Updated the document as per the latest review comments. Fixed nits * Made changes as per latest suggestions. * Some more changes added. * Updated as per suggestions. * Changed the release process section. * SIG Docs edits Small edits to match style guidelines. * add plus to feature state * capitalization * revert feature state shortcode since this is a Kubernetes extension, not a direct feature, it shouldn't use the regular feature state tagging. (cherry picked from commit 7730c1540b637be74b9b21d4128a145994eb19cc) * Remove initializers from doc. It will be removed in 1.14 (#12331) * kubeadm: Document CRI auto detection functionality (#12462) Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com> * Minor doc change for GAing Pod DNS Config (#12514) * Graduate ExpandInUsePersistentVolumes feature to beta (#10574) * Rename 2018-11-07-grpc-load-balancing-with-linkerd.md.md file (#12594) * Add dynamic percentage of node scoring to user docs (#12235) * Add dynamic percentage of node scoring to user docs * addressed review comments * delete special symbol (#12445) * Update documentation for VolumeSubpathEnvExpansion (#11843) * Update documentation for VolumeSubpathEnvExpansion * Address comments - improve descriptions * Graduate Pod Priority and Preemption to GA (#12428) * Added Instana links to the documentation (#12977) * Added link to the Instana Kubernetes integration * Added Instana link for services section Added Instana and a link to the Kubernetes integration to the analytics services section and broadened the scope to APM, monitoring and analytics. * Oxford comma /flex * More Oxford commas, because they matter * Update kubectl plugins to stable (#12847) * documentation for CSI topology beta (#12889) * Document changes to default RBAC discovery ClusterRole(Binding)s (#12888) * Document changes to default RBAC discovery ClusterRole(Binding)s Documentation for https://github.com/kubernetes/enhancements/issues/789 and https://github.com/kubernetes/kubernetes/pull/73807 * documentation review feedback * CSI raw block to beta (#12931) * Change incorrect string raw to block (#12926) Fixes #12925 * Update documentation on node OS/arch labels (#12976) These labels have been promoted to GA: https://github.com/kubernetes/enhancements/issues/793 * local pv GA doc updates (#12915) * Publish CRD OpenAPI Documentation (#12910) * add documentation for CustomResourcePublishOpenAPI * address comments fix links, ordered lists, style and typo * kubeadm: add document for upgrading from 1.13 to 1.14 (single CP and HA) (#13189) * kubeadm: add document for upgrading from 1.13 to 1.14 - remove doc for upgrading 1.10 -> 1.11 * kubeadm: apply amends to upgrade-1.14 doc * kubeadm: apply amends to upgrade-1.14 doc (part2) * kubeadm: apply amends to upgrade-1.14 doc (part3) * kubeadm: add note about "upgrade node experimental-control-plane" + add comment about `upgrade plan` * kubeadm: add missing "You should see output similar to this" * fix bullet indentation (#13214) * mark PodReadinessGate GA (#12800) * Update RuntimeClass documentation for beta (#13043) * Update RuntimeClass documentation for beta * Update feature gate & add upgrade section * formatting fixes * Highlight upgrade action required * Address feedback * CSI ephemeral volume alpha documentation (#10934) * update kubectl documentation (#12867) * update kubectl documentation * add document for Secret/ConfigMap generators * replace `kubectl create -f` by `kubectl apply -f` * Add page for kustomization support in kubectl * fix spelling errors and address comments * Documentation for Windows GMSA feature (#12936) * Documentation for Windows GMSA feature Signed-off-by: Deep Debroy <ddebroy@docker.com> * Enhancements to GMSA docs Signed-off-by: Deep Debroy <ddebroy@docker.com> * Fix links Signed-off-by: Deep Debroy <ddebroy@docker.com> * Fix GMSA link Signed-off-by: Deep Debroy <ddebroy@docker.com> * Add GMSA feature flag in feature flag list Signed-off-by: Deep Debroy <ddebroy@docker.com> * Relocate GMSA to container configuration Signed-off-by: Deep Debroy <ddebroy@docker.com> * Add example for container spec Signed-off-by: Deep Debroy <ddebroy@docker.com> * Remove changes in Windows index Signed-off-by: Deep Debroy <ddebroy@docker.com> * Update configure-gmsa.md * Update configure-gmsa.md * Update configure-gmsa.md * Update configure-gmsa.md * Rearrange the steps into two sections and other edits Signed-off-by: Deep Debroy <ddebroy@docker.com> * Fix links Signed-off-by: Deep Debroy <ddebroy@docker.com> * Add reference to script to generate GMSA YAMLs Signed-off-by: Deep Debroy <ddebroy@docker.com> * Some more clarifications for GMSA Signed-off-by: Deep Debroy <ddebroy@docker.com> * HugePages graduated to GA (#13004) * HugePages graduated to GA * fixing nit for build * Docs for node PID limiting (https://github.com/kubernetes/kubernetes/pull/73651) (#12932) * kubeadm: update the reference documentation for 1.14 (#12911) * kubeadm: update list of generated files for 1.14 NOTE: PLACEHOLDERS! these files are generated by SIG Docs each release, but we need them to pass the k/website PR CI. - add join_phase* (new sub phases of join) - add init_phase_upload-certs.md (new upload certs phase for init) - remove alpha-preflight (now both init and join have this) * kubeadm: update reference docs includes for 1.14 - remove includes from alpha.md - add upload-certs to init-phase.md - add join-phase.md and it's phases * kubeadm: update the editorial content of join and init - cleanup master->control-plane node - add some notes about phases and join - remove table about pre-pulling images - remove outdated info about self-hosting * kubeadm: update target release for v1alpha3 removal 1.14 -> 1.15 * kubeadm: copy edits for 1.14 reference docs (part1) * kubeadm: use "shell" for code blocks * kubeadm: update the 1.14 HA guide (#13191) * kubeadm: update the 1.14 HA guide * kubeadm: try to fix note/caution indent in HA page * kubeadm: fix missing sudo and minor amends in HA doc * kubeadm: apply latest amends to the HA doc for 1.14 * fixed a few missed merge conflicts * Admission Webhook new features doc (#12938) - kubernetes/kubernetes#74998 - kubernetes/kubernetes#74477 - kubernetes/kubernetes#74562 * Clarifications and fixes in GMSA doc (#13226) * Clarifications and fixes in GMSA doc Signed-off-by: Deep Debroy <ddebroy@docker.com> * Update configure-gmsa.md * Reformat to align headings and pre-reqs better Signed-off-by: Deep Debroy <ddebroy@docker.com> * Reformat to align headings and pre-reqs better Signed-off-by: Deep Debroy <ddebroy@docker.com> * Reformat to fix bullets Signed-off-by: Deep Debroy <ddebroy@docker.com> * Reword application of sample gmsa Signed-off-by: Deep Debroy <ddebroy@docker.com> * Update configure-gmsa.md * Address feedback to use active voice Signed-off-by: Deep Debroy <ddebroy@docker.com> * Address feedback to use active voice Signed-off-by: Deep Debroy <ddebroy@docker.com> * RunAsGroup documentation for Progressing this to Beta (#12297) * start serverside-apply documentation (#13077) * start serverside-apply documentation * add more concept info on server side apply * Update api concepts * Update api-concepts.md * fix style issues * Document CSI update (#12928) * Document CSI update * Finish CSI documentation Also fix mistake with ExpandInUsePersistentVolumes documented as beta * Overall docs for CSI Migration feature (#12935) * Placeholder docs for CSI Migration feature Signed-off-by: Deep Debroy <ddebroy@docker.com> * Address CR comments and update feature gates Signed-off-by: Deep Debroy <ddebroy@docker.com> * Add mappings for CSI plugins Signed-off-by: Deep Debroy <ddebroy@docker.com> * Add sections for AWS and GCE PD migration Signed-off-by: Deep Debroy <ddebroy@docker.com> * Add docs for Cinder and CSI Migration info Signed-off-by: Deep Debroy <ddebroy@docker.com> * Clarify scope to volumes with file system Signed-off-by: Deep Debroy <ddebroy@docker.com> * Change the format of EBS and Cinder CSI Migration sections to follow the GCE template Signed-off-by: Deep Debroy <ddebroy@docker.com> * Windows documentation updates for 1.14 (#12929) * Updated the note to indicate doc work for 1.14 * first attempt at md export from gdoc * simplifyig * big attempt * moving DRAFT windows content to PR for review * moving content to PR in markdown for review * updated note tags * Delete windows-contributing.md deleting this file as it is already ported to the github contributor guide * fixed formatting in intro and cluster setup guide * updating formatting for running containers guide * rejiggered end of troubleshooting * fixed minor typos * Clarified the windows binary download step * Update _index.md making updates based on feedback * Update _index.md updating ovn-kubernetes docs * Update _index.md * Update _index.md * updating relative docs links updating all the links to be relative links to /docs * Update _index.md * Update _index.md updates for windows services and ovn-kubernetes * formatted for correct step numbering * fix typos * Update _index.md updates for flannel PR in troubleshooting * Update _index.md * Update _index.md updating a few sections like roadmap, services, troubleshooting/filing tickets * Update _index.md * Update _index.md * Update _index.md * Fixed a few whitespace issues * Update _index.md * Update _index.md * Update _index.md * add section on upgrading CoreDNS (#12909) * documentation for kubelet resource metrics endpoint (#12934) * windows docs updates for 1.14 (#13279) * Delete sample-l2bridge-wincni-config.json this file is not used anywhere * Update _index.md * Update _index.md * Update _index.md * Update _index.md * Update _index.md * Rename content/en/docs/getting-started-guides/windows/_index.md to content/en/docs/setup/windows/_index.md moving to new location * Delete flannel-master-kubectl-get-ds.png * Delete flannel-master-kubeclt-get-pods.png * Delete windows-docker-error.png * Add files via upload * Rename _index.md to add-windows-nodes.md * Create _index.md * Update _index.md * Update add-windows-nodes.md * Update add-windows-nodes.md * Create user-guide-windows-nodes.md * Create user-guide-windows-containers.md * Update and rename add-windows-nodes.md to intro-windows-nodes.md * Update user-guide-windows-containers.md * Rename intro-windows-nodes.md to intro-windows-in-kubernetes.md * Update user-guide-windows-nodes.md * Update user-guide-windows-containers.md * Update user-guide-windows-containers.md * Update user-guide-windows-nodes.md * Update user-guide-windows-containers.md * Update _index.md * Update intro-windows-in-kubernetes.md * Update intro-windows-in-kubernetes.md fixing the pause image * Update intro-windows-in-kubernetes.md changing tables from html to MD * Update user-guide-windows-nodes.md converting tables from HTML to MD * Update intro-windows-in-kubernetes.md * Update user-guide-windows-nodes.md * Update user-guide-windows-nodes.md * Update user-guide-windows-nodes.md updating the numbering , even though it messes up the notes a little bit. Jim will file a ticket to follow up * Update user-guide-windows-nodes.md * update to windows docs for 1.14 (#13322) * Update intro-windows-in-kubernetes.md * Update intro-windows-in-kubernetes.md * Update intro-windows-in-kubernetes.md * Update intro-windows-in-kubernetes.md * Update intro-windows-in-kubernetes.md * Update user-guide-windows-containers.md * Update user-guide-windows-nodes.md * Update intro-windows-in-kubernetes.md (#13344) * server side apply followup (#13321) * change some parts of serverside apply docs in response to comments * fix typos and wording * Update config.toml (#13365)
2019-03-25 22:06:16 +00:00
kubectl apply -f https://k8s.io/examples/debug/node-problem-detector.yaml
```
### Addon Pod
This is for those who have their own cluster bootstrap solution, and don't need
to overwrite the default configuration. They could leverage the addon pod to
further automate the deployment.
Just create `node-problem-detector.yaml`, and put it under the addon pods directory
`/etc/kubernetes/addons/node-problem-detector` on master node.
## Overwrite the Configuration
The [default configuration](https://github.com/kubernetes/node-problem-detector/tree/v0.1/config)
is embedded when building the Docker image of node problem detector.
However, you can use [ConfigMap](/docs/tasks/configure-pod-container/configure-pod-configmap/) to overwrite it
following the steps:
* **Step 1:** Change the config files in `config/`.
* **Step 2:** Create the ConfigMap `node-problem-detector-config` with `kubectl create configmap
node-problem-detector-config --from-file=config/`.
* **Step 3:** Change the `node-problem-detector.yaml` to use the ConfigMap:
{{< codenew file="debug/node-problem-detector-configmap.yaml" >}}
* **Step 4:** Re-create the node problem detector with the new yaml file:
```shell
kubectl delete -f https://k8s.io/examples/debug/node-problem-detector.yaml # If you have a node-problem-detector running
Official 1.14 Release Docs (#13174) * Official documentation on Poseidon/Firmament, a new multi-scheduler support for K8S. (#11752) * Added documentation about Poseidon-Firmament scheduler * Fixed some style issues. * Udpated the document as per the review comments. * Fixed some typos and updated the document * Updated the document as per the review comments. * Document timeout attribute for kms-plugin. (#12158) See 72540. * Official documentation on Poseidon/Firmament, a new multi-scheduler (#12343) * Removed the old version of the Poseidon documentation. Incorrect location. * Official documentation on Poseidon/Firmament, a new multi-scheduler support for K8S (#12069) * Official documentation on Poseidon/Firmament, a new multi-scheduler support for K8S. (#11752) * Added documentation about Poseidon-Firmament scheduler * Fixed some style issues. * Udpated the document as per the review comments. * Fixed some typos and updated the document * Updated the document as per the review comments. * Updated the document as per review comments. Added config details. * Updated the document as per the latest review comments. Fixed nits * Made changes as per latest suggestions. * Some more changes added. * Updated as per suggestions. * Changed the release process section. * SIG Docs edits Small edits to match style guidelines. * add plus to feature state * capitalization * revert feature state shortcode since this is a Kubernetes extension, not a direct feature, it shouldn't use the regular feature state tagging. (cherry picked from commit 7730c1540b637be74b9b21d4128a145994eb19cc) * Remove initializers from doc. It will be removed in 1.14 (#12331) * kubeadm: Document CRI auto detection functionality (#12462) Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com> * Minor doc change for GAing Pod DNS Config (#12514) * Graduate ExpandInUsePersistentVolumes feature to beta (#10574) * Rename 2018-11-07-grpc-load-balancing-with-linkerd.md.md file (#12594) * Add dynamic percentage of node scoring to user docs (#12235) * Add dynamic percentage of node scoring to user docs * addressed review comments * delete special symbol (#12445) * Update documentation for VolumeSubpathEnvExpansion (#11843) * Update documentation for VolumeSubpathEnvExpansion * Address comments - improve descriptions * Graduate Pod Priority and Preemption to GA (#12428) * Added Instana links to the documentation (#12977) * Added link to the Instana Kubernetes integration * Added Instana link for services section Added Instana and a link to the Kubernetes integration to the analytics services section and broadened the scope to APM, monitoring and analytics. * Oxford comma /flex * More Oxford commas, because they matter * Update kubectl plugins to stable (#12847) * documentation for CSI topology beta (#12889) * Document changes to default RBAC discovery ClusterRole(Binding)s (#12888) * Document changes to default RBAC discovery ClusterRole(Binding)s Documentation for https://github.com/kubernetes/enhancements/issues/789 and https://github.com/kubernetes/kubernetes/pull/73807 * documentation review feedback * CSI raw block to beta (#12931) * Change incorrect string raw to block (#12926) Fixes #12925 * Update documentation on node OS/arch labels (#12976) These labels have been promoted to GA: https://github.com/kubernetes/enhancements/issues/793 * local pv GA doc updates (#12915) * Publish CRD OpenAPI Documentation (#12910) * add documentation for CustomResourcePublishOpenAPI * address comments fix links, ordered lists, style and typo * kubeadm: add document for upgrading from 1.13 to 1.14 (single CP and HA) (#13189) * kubeadm: add document for upgrading from 1.13 to 1.14 - remove doc for upgrading 1.10 -> 1.11 * kubeadm: apply amends to upgrade-1.14 doc * kubeadm: apply amends to upgrade-1.14 doc (part2) * kubeadm: apply amends to upgrade-1.14 doc (part3) * kubeadm: add note about "upgrade node experimental-control-plane" + add comment about `upgrade plan` * kubeadm: add missing "You should see output similar to this" * fix bullet indentation (#13214) * mark PodReadinessGate GA (#12800) * Update RuntimeClass documentation for beta (#13043) * Update RuntimeClass documentation for beta * Update feature gate & add upgrade section * formatting fixes * Highlight upgrade action required * Address feedback * CSI ephemeral volume alpha documentation (#10934) * update kubectl documentation (#12867) * update kubectl documentation * add document for Secret/ConfigMap generators * replace `kubectl create -f` by `kubectl apply -f` * Add page for kustomization support in kubectl * fix spelling errors and address comments * Documentation for Windows GMSA feature (#12936) * Documentation for Windows GMSA feature Signed-off-by: Deep Debroy <ddebroy@docker.com> * Enhancements to GMSA docs Signed-off-by: Deep Debroy <ddebroy@docker.com> * Fix links Signed-off-by: Deep Debroy <ddebroy@docker.com> * Fix GMSA link Signed-off-by: Deep Debroy <ddebroy@docker.com> * Add GMSA feature flag in feature flag list Signed-off-by: Deep Debroy <ddebroy@docker.com> * Relocate GMSA to container configuration Signed-off-by: Deep Debroy <ddebroy@docker.com> * Add example for container spec Signed-off-by: Deep Debroy <ddebroy@docker.com> * Remove changes in Windows index Signed-off-by: Deep Debroy <ddebroy@docker.com> * Update configure-gmsa.md * Update configure-gmsa.md * Update configure-gmsa.md * Update configure-gmsa.md * Rearrange the steps into two sections and other edits Signed-off-by: Deep Debroy <ddebroy@docker.com> * Fix links Signed-off-by: Deep Debroy <ddebroy@docker.com> * Add reference to script to generate GMSA YAMLs Signed-off-by: Deep Debroy <ddebroy@docker.com> * Some more clarifications for GMSA Signed-off-by: Deep Debroy <ddebroy@docker.com> * HugePages graduated to GA (#13004) * HugePages graduated to GA * fixing nit for build * Docs for node PID limiting (https://github.com/kubernetes/kubernetes/pull/73651) (#12932) * kubeadm: update the reference documentation for 1.14 (#12911) * kubeadm: update list of generated files for 1.14 NOTE: PLACEHOLDERS! these files are generated by SIG Docs each release, but we need them to pass the k/website PR CI. - add join_phase* (new sub phases of join) - add init_phase_upload-certs.md (new upload certs phase for init) - remove alpha-preflight (now both init and join have this) * kubeadm: update reference docs includes for 1.14 - remove includes from alpha.md - add upload-certs to init-phase.md - add join-phase.md and it's phases * kubeadm: update the editorial content of join and init - cleanup master->control-plane node - add some notes about phases and join - remove table about pre-pulling images - remove outdated info about self-hosting * kubeadm: update target release for v1alpha3 removal 1.14 -> 1.15 * kubeadm: copy edits for 1.14 reference docs (part1) * kubeadm: use "shell" for code blocks * kubeadm: update the 1.14 HA guide (#13191) * kubeadm: update the 1.14 HA guide * kubeadm: try to fix note/caution indent in HA page * kubeadm: fix missing sudo and minor amends in HA doc * kubeadm: apply latest amends to the HA doc for 1.14 * fixed a few missed merge conflicts * Admission Webhook new features doc (#12938) - kubernetes/kubernetes#74998 - kubernetes/kubernetes#74477 - kubernetes/kubernetes#74562 * Clarifications and fixes in GMSA doc (#13226) * Clarifications and fixes in GMSA doc Signed-off-by: Deep Debroy <ddebroy@docker.com> * Update configure-gmsa.md * Reformat to align headings and pre-reqs better Signed-off-by: Deep Debroy <ddebroy@docker.com> * Reformat to align headings and pre-reqs better Signed-off-by: Deep Debroy <ddebroy@docker.com> * Reformat to fix bullets Signed-off-by: Deep Debroy <ddebroy@docker.com> * Reword application of sample gmsa Signed-off-by: Deep Debroy <ddebroy@docker.com> * Update configure-gmsa.md * Address feedback to use active voice Signed-off-by: Deep Debroy <ddebroy@docker.com> * Address feedback to use active voice Signed-off-by: Deep Debroy <ddebroy@docker.com> * RunAsGroup documentation for Progressing this to Beta (#12297) * start serverside-apply documentation (#13077) * start serverside-apply documentation * add more concept info on server side apply * Update api concepts * Update api-concepts.md * fix style issues * Document CSI update (#12928) * Document CSI update * Finish CSI documentation Also fix mistake with ExpandInUsePersistentVolumes documented as beta * Overall docs for CSI Migration feature (#12935) * Placeholder docs for CSI Migration feature Signed-off-by: Deep Debroy <ddebroy@docker.com> * Address CR comments and update feature gates Signed-off-by: Deep Debroy <ddebroy@docker.com> * Add mappings for CSI plugins Signed-off-by: Deep Debroy <ddebroy@docker.com> * Add sections for AWS and GCE PD migration Signed-off-by: Deep Debroy <ddebroy@docker.com> * Add docs for Cinder and CSI Migration info Signed-off-by: Deep Debroy <ddebroy@docker.com> * Clarify scope to volumes with file system Signed-off-by: Deep Debroy <ddebroy@docker.com> * Change the format of EBS and Cinder CSI Migration sections to follow the GCE template Signed-off-by: Deep Debroy <ddebroy@docker.com> * Windows documentation updates for 1.14 (#12929) * Updated the note to indicate doc work for 1.14 * first attempt at md export from gdoc * simplifyig * big attempt * moving DRAFT windows content to PR for review * moving content to PR in markdown for review * updated note tags * Delete windows-contributing.md deleting this file as it is already ported to the github contributor guide * fixed formatting in intro and cluster setup guide * updating formatting for running containers guide * rejiggered end of troubleshooting * fixed minor typos * Clarified the windows binary download step * Update _index.md making updates based on feedback * Update _index.md updating ovn-kubernetes docs * Update _index.md * Update _index.md * updating relative docs links updating all the links to be relative links to /docs * Update _index.md * Update _index.md updates for windows services and ovn-kubernetes * formatted for correct step numbering * fix typos * Update _index.md updates for flannel PR in troubleshooting * Update _index.md * Update _index.md updating a few sections like roadmap, services, troubleshooting/filing tickets * Update _index.md * Update _index.md * Update _index.md * Fixed a few whitespace issues * Update _index.md * Update _index.md * Update _index.md * add section on upgrading CoreDNS (#12909) * documentation for kubelet resource metrics endpoint (#12934) * windows docs updates for 1.14 (#13279) * Delete sample-l2bridge-wincni-config.json this file is not used anywhere * Update _index.md * Update _index.md * Update _index.md * Update _index.md * Update _index.md * Rename content/en/docs/getting-started-guides/windows/_index.md to content/en/docs/setup/windows/_index.md moving to new location * Delete flannel-master-kubectl-get-ds.png * Delete flannel-master-kubeclt-get-pods.png * Delete windows-docker-error.png * Add files via upload * Rename _index.md to add-windows-nodes.md * Create _index.md * Update _index.md * Update add-windows-nodes.md * Update add-windows-nodes.md * Create user-guide-windows-nodes.md * Create user-guide-windows-containers.md * Update and rename add-windows-nodes.md to intro-windows-nodes.md * Update user-guide-windows-containers.md * Rename intro-windows-nodes.md to intro-windows-in-kubernetes.md * Update user-guide-windows-nodes.md * Update user-guide-windows-containers.md * Update user-guide-windows-containers.md * Update user-guide-windows-nodes.md * Update user-guide-windows-containers.md * Update _index.md * Update intro-windows-in-kubernetes.md * Update intro-windows-in-kubernetes.md fixing the pause image * Update intro-windows-in-kubernetes.md changing tables from html to MD * Update user-guide-windows-nodes.md converting tables from HTML to MD * Update intro-windows-in-kubernetes.md * Update user-guide-windows-nodes.md * Update user-guide-windows-nodes.md * Update user-guide-windows-nodes.md updating the numbering , even though it messes up the notes a little bit. Jim will file a ticket to follow up * Update user-guide-windows-nodes.md * update to windows docs for 1.14 (#13322) * Update intro-windows-in-kubernetes.md * Update intro-windows-in-kubernetes.md * Update intro-windows-in-kubernetes.md * Update intro-windows-in-kubernetes.md * Update intro-windows-in-kubernetes.md * Update user-guide-windows-containers.md * Update user-guide-windows-nodes.md * Update intro-windows-in-kubernetes.md (#13344) * server side apply followup (#13321) * change some parts of serverside apply docs in response to comments * fix typos and wording * Update config.toml (#13365)
2019-03-25 22:06:16 +00:00
kubectl apply -f https://k8s.io/examples/debug/node-problem-detector-configmap.yaml
```
***Notice that this approach only applies to node problem detector started with `kubectl`.***
For node problem detector running as cluster addon, because addon manager doesn't support
ConfigMap, configuration overwriting is not supported now.
## Kernel Monitor
*Kernel Monitor* is a problem daemon in node problem detector. It monitors kernel log
and detects known kernel issues following predefined rules.
The Kernel Monitor matches kernel issues according to a set of predefined rule list in
[`config/kernel-monitor.json`](https://github.com/kubernetes/node-problem-detector/blob/v0.1/config/kernel-monitor.json).
The rule list is extensible, and you can always extend it by overwriting the
configuration.
### Add New NodeConditions
To support new node conditions, you can extend the `conditions` field in
`config/kernel-monitor.json` with new condition definition:
```json
{
"type": "NodeConditionType",
"reason": "CamelCaseDefaultNodeConditionReason",
"message": "arbitrary default node condition message"
}
```
### Detect New Problems
To detect new problems, you can extend the `rules` field in `config/kernel-monitor.json`
with new rule definition:
```json
{
"type": "temporary/permanent",
"condition": "NodeConditionOfPermanentIssue",
"reason": "CamelCaseShortReason",
"message": "regexp matching the issue in the kernel log"
}
```
### Change Log Path
Kernel log in different OS distros may locate in different path. The `log`
field in `config/kernel-monitor.json` is the log path inside the container.
You can always configure it to match your OS distro.
### Support Other Log Format
Kernel monitor uses [`Translator`](https://github.com/kubernetes/node-problem-detector/blob/v0.1/pkg/kernelmonitor/translator/translator.go)
plugin to translate kernel log the internal data structure. It is easy to
implement a new translator for a new log format.
{{% /capture %}}
{{% capture discussion %}}
## Caveats
It is recommended to run the node problem detector in your cluster to monitor
the node health. However, you should be aware that this will introduce extra
resource overhead on each node. Usually this is fine, because:
* The kernel log is generated relatively slowly.
* Resource limit is set for node problem detector.
* Even under high load, the resource usage is acceptable.
(see [benchmark result](https://github.com/kubernetes/node-problem-detector/issues/2#issuecomment-220255629))
{{% /capture %}}