2017-03-15 00:25:29 +00:00
|
|
|
---
|
2018-02-18 19:29:37 +00:00
|
|
|
reviewers:
|
2017-03-15 00:25:29 +00:00
|
|
|
- Random-Liu
|
|
|
|
- dchen1107
|
2018-06-22 18:20:04 +00:00
|
|
|
content_template: templates/task
|
2017-06-08 20:41:57 +00:00
|
|
|
title: Monitor Node Health
|
2017-03-15 00:25:29 +00:00
|
|
|
---
|
|
|
|
|
2018-06-22 18:20:04 +00:00
|
|
|
{{% capture overview %}}
|
2017-03-15 00:25:29 +00:00
|
|
|
|
2017-04-19 17:56:47 +00:00
|
|
|
*Node problem detector* is a [DaemonSet](/docs/concepts/workloads/controllers/daemonset/) monitoring the
|
2017-03-15 00:25:29 +00:00
|
|
|
node health. It collects node problems from various daemons and reports them
|
2017-06-29 23:15:31 +00:00
|
|
|
to the apiserver as [NodeCondition](/docs/concepts/architecture/nodes/#condition)
|
2018-05-05 16:00:51 +00:00
|
|
|
and [Event](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#event-v1-core).
|
2017-03-15 00:25:29 +00:00
|
|
|
|
|
|
|
It supports some known kernel issue detection now, and will detect more and
|
|
|
|
more node problems over time.
|
|
|
|
|
|
|
|
Currently Kubernetes won't take any action on the node conditions and events
|
|
|
|
generated by node problem detector. In the future, a remedy system could be
|
|
|
|
introduced to deal with node problems.
|
|
|
|
|
|
|
|
See more information
|
|
|
|
[here](https://github.com/kubernetes/node-problem-detector).
|
|
|
|
|
2018-06-22 18:20:04 +00:00
|
|
|
{{% /capture %}}
|
|
|
|
|
|
|
|
{{% capture prerequisites %}}
|
|
|
|
|
|
|
|
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
|
|
|
|
|
|
|
|
{{% /capture %}}
|
|
|
|
|
|
|
|
{{% capture steps %}}
|
|
|
|
|
2017-03-15 00:25:29 +00:00
|
|
|
## Limitations
|
|
|
|
|
|
|
|
* The kernel issue detection of node problem detector only supports file based
|
|
|
|
kernel log now. It doesn't support log tools like journald.
|
|
|
|
|
|
|
|
* The kernel issue detection of node problem detector has assumption on kernel
|
|
|
|
log format, and now it only works on Ubuntu and Debian. However, it is easy to extend
|
2017-10-05 01:25:57 +00:00
|
|
|
it to [support other log format](/docs/tasks/debug-application-cluster/monitor-node-health/#support-other-log-format).
|
2017-03-15 00:25:29 +00:00
|
|
|
|
|
|
|
## Enable/Disable in GCE cluster
|
|
|
|
|
2019-06-12 11:57:29 +00:00
|
|
|
Node problem detector is [running as a cluster addon](/docs/setup/best-practices/cluster-large/#addon-resources) enabled by default in the
|
2017-03-15 00:25:29 +00:00
|
|
|
gce cluster.
|
|
|
|
|
|
|
|
You can enable/disable it by setting the environment variable
|
|
|
|
`KUBE_ENABLE_NODE_PROBLEM_DETECTOR` before `kube-up.sh`.
|
|
|
|
|
|
|
|
## Use in Other Environment
|
|
|
|
|
|
|
|
To enable node problem detector in other environment outside of GCE, you can use
|
|
|
|
either `kubectl` or addon pod.
|
|
|
|
|
|
|
|
### Kubectl
|
|
|
|
|
|
|
|
This is the recommended way to start node problem detector outside of GCE. It
|
|
|
|
provides more flexible management, such as overwriting the default
|
|
|
|
configuration to fit it into your environment or detect
|
|
|
|
customized node problems.
|
|
|
|
|
2018-04-07 14:07:08 +00:00
|
|
|
* **Step 1:** `node-problem-detector.yaml`:
|
|
|
|
|
2018-07-02 18:17:19 +00:00
|
|
|
{{< codenew file="debug/node-problem-detector.yaml" >}}
|
2018-04-07 14:07:08 +00:00
|
|
|
|
2017-03-15 00:25:29 +00:00
|
|
|
|
|
|
|
***Notice that you should make sure the system log directory is right for your
|
|
|
|
OS distro.***
|
|
|
|
|
|
|
|
* **Step 2:** Start node problem detector with `kubectl`:
|
|
|
|
|
|
|
|
```shell
|
Official 1.14 Release Docs (#13174)
* Official documentation on Poseidon/Firmament, a new multi-scheduler support for K8S. (#11752)
* Added documentation about Poseidon-Firmament scheduler
* Fixed some style issues.
* Udpated the document as per the review comments.
* Fixed some typos and updated the document
* Updated the document as per the review comments.
* Document timeout attribute for kms-plugin. (#12158)
See 72540.
* Official documentation on Poseidon/Firmament, a new multi-scheduler (#12343)
* Removed the old version of the Poseidon documentation. Incorrect location.
* Official documentation on Poseidon/Firmament, a new multi-scheduler support for K8S (#12069)
* Official documentation on Poseidon/Firmament, a new multi-scheduler support for K8S. (#11752)
* Added documentation about Poseidon-Firmament scheduler
* Fixed some style issues.
* Udpated the document as per the review comments.
* Fixed some typos and updated the document
* Updated the document as per the review comments.
* Updated the document as per review comments. Added config details.
* Updated the document as per the latest review comments. Fixed nits
* Made changes as per latest suggestions.
* Some more changes added.
* Updated as per suggestions.
* Changed the release process section.
* SIG Docs edits
Small edits to match style guidelines.
* add plus to feature state
* capitalization
* revert feature state shortcode
since this is a Kubernetes extension, not a direct feature, it shouldn't use the regular feature state tagging.
(cherry picked from commit 7730c1540b637be74b9b21d4128a145994eb19cc)
* Remove initializers from doc. It will be removed in 1.14 (#12331)
* kubeadm: Document CRI auto detection functionality (#12462)
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
* Minor doc change for GAing Pod DNS Config (#12514)
* Graduate ExpandInUsePersistentVolumes feature to beta (#10574)
* Rename 2018-11-07-grpc-load-balancing-with-linkerd.md.md file (#12594)
* Add dynamic percentage of node scoring to user docs (#12235)
* Add dynamic percentage of node scoring to user docs
* addressed review comments
* delete special symbol (#12445)
* Update documentation for VolumeSubpathEnvExpansion (#11843)
* Update documentation for VolumeSubpathEnvExpansion
* Address comments - improve descriptions
* Graduate Pod Priority and Preemption to GA (#12428)
* Added Instana links to the documentation (#12977)
* Added link to the Instana Kubernetes integration
* Added Instana link for services section
Added Instana and a link to the Kubernetes integration to the analytics services section and broadened the scope to APM, monitoring and analytics.
* Oxford comma /flex
* More Oxford commas, because they matter
* Update kubectl plugins to stable (#12847)
* documentation for CSI topology beta (#12889)
* Document changes to default RBAC discovery ClusterRole(Binding)s (#12888)
* Document changes to default RBAC discovery ClusterRole(Binding)s
Documentation for https://github.com/kubernetes/enhancements/issues/789 and https://github.com/kubernetes/kubernetes/pull/73807
* documentation review feedback
* CSI raw block to beta (#12931)
* Change incorrect string raw to block (#12926)
Fixes #12925
* Update documentation on node OS/arch labels (#12976)
These labels have been promoted to GA:
https://github.com/kubernetes/enhancements/issues/793
* local pv GA doc updates (#12915)
* Publish CRD OpenAPI Documentation (#12910)
* add documentation for CustomResourcePublishOpenAPI
* address comments
fix links, ordered lists, style and typo
* kubeadm: add document for upgrading from 1.13 to 1.14 (single CP and HA) (#13189)
* kubeadm: add document for upgrading from 1.13 to 1.14
- remove doc for upgrading 1.10 -> 1.11
* kubeadm: apply amends to upgrade-1.14 doc
* kubeadm: apply amends to upgrade-1.14 doc (part2)
* kubeadm: apply amends to upgrade-1.14 doc (part3)
* kubeadm: add note about "upgrade node experimental-control-plane"
+ add comment about `upgrade plan`
* kubeadm: add missing "You should see output similar to this"
* fix bullet indentation (#13214)
* mark PodReadinessGate GA (#12800)
* Update RuntimeClass documentation for beta (#13043)
* Update RuntimeClass documentation for beta
* Update feature gate & add upgrade section
* formatting fixes
* Highlight upgrade action required
* Address feedback
* CSI ephemeral volume alpha documentation (#10934)
* update kubectl documentation (#12867)
* update kubectl documentation
* add document for Secret/ConfigMap generators
* replace `kubectl create -f` by `kubectl apply -f`
* Add page for kustomization support in kubectl
* fix spelling errors and address comments
* Documentation for Windows GMSA feature (#12936)
* Documentation for Windows GMSA feature
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Enhancements to GMSA docs
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Fix links
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Fix GMSA link
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Add GMSA feature flag in feature flag list
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Relocate GMSA to container configuration
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Add example for container spec
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Remove changes in Windows index
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Update configure-gmsa.md
* Update configure-gmsa.md
* Update configure-gmsa.md
* Update configure-gmsa.md
* Rearrange the steps into two sections and other edits
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Fix links
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Add reference to script to generate GMSA YAMLs
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Some more clarifications for GMSA
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* HugePages graduated to GA (#13004)
* HugePages graduated to GA
* fixing nit for build
* Docs for node PID limiting (https://github.com/kubernetes/kubernetes/pull/73651) (#12932)
* kubeadm: update the reference documentation for 1.14 (#12911)
* kubeadm: update list of generated files for 1.14
NOTE: PLACEHOLDERS! these files are generated by SIG Docs each
release, but we need them to pass the k/website PR CI.
- add join_phase* (new sub phases of join)
- add init_phase_upload-certs.md (new upload certs phase for init)
- remove alpha-preflight (now both init and join have this)
* kubeadm: update reference docs includes for 1.14
- remove includes from alpha.md
- add upload-certs to init-phase.md
- add join-phase.md and it's phases
* kubeadm: update the editorial content of join and init
- cleanup master->control-plane node
- add some notes about phases and join
- remove table about pre-pulling images
- remove outdated info about self-hosting
* kubeadm: update target release for v1alpha3 removal
1.14 -> 1.15
* kubeadm: copy edits for 1.14 reference docs (part1)
* kubeadm: use "shell" for code blocks
* kubeadm: update the 1.14 HA guide (#13191)
* kubeadm: update the 1.14 HA guide
* kubeadm: try to fix note/caution indent in HA page
* kubeadm: fix missing sudo and minor amends in HA doc
* kubeadm: apply latest amends to the HA doc for 1.14
* fixed a few missed merge conflicts
* Admission Webhook new features doc (#12938)
- kubernetes/kubernetes#74998
- kubernetes/kubernetes#74477
- kubernetes/kubernetes#74562
* Clarifications and fixes in GMSA doc (#13226)
* Clarifications and fixes in GMSA doc
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Update configure-gmsa.md
* Reformat to align headings and pre-reqs better
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Reformat to align headings and pre-reqs better
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Reformat to fix bullets
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Reword application of sample gmsa
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Update configure-gmsa.md
* Address feedback to use active voice
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Address feedback to use active voice
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* RunAsGroup documentation for Progressing this to Beta (#12297)
* start serverside-apply documentation (#13077)
* start serverside-apply documentation
* add more concept info on server side apply
* Update api concepts
* Update api-concepts.md
* fix style issues
* Document CSI update (#12928)
* Document CSI update
* Finish CSI documentation
Also fix mistake with ExpandInUsePersistentVolumes documented as beta
* Overall docs for CSI Migration feature (#12935)
* Placeholder docs for CSI Migration feature
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Address CR comments and update feature gates
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Add mappings for CSI plugins
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Add sections for AWS and GCE PD migration
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Add docs for Cinder and CSI Migration info
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Clarify scope to volumes with file system
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Change the format of EBS and Cinder CSI Migration sections to follow the GCE template
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Windows documentation updates for 1.14 (#12929)
* Updated the note to indicate doc work for 1.14
* first attempt at md export from gdoc
* simplifyig
* big attempt
* moving DRAFT windows content to PR for review
* moving content to PR in markdown for review
* updated note tags
* Delete windows-contributing.md
deleting this file as it is already ported to the github contributor guide
* fixed formatting in intro and cluster setup guide
* updating formatting for running containers guide
* rejiggered end of troubleshooting
* fixed minor typos
* Clarified the windows binary download step
* Update _index.md
making updates based on feedback
* Update _index.md
updating ovn-kubernetes docs
* Update _index.md
* Update _index.md
* updating relative docs links
updating all the links to be relative links to /docs
* Update _index.md
* Update _index.md
updates for windows services and ovn-kubernetes
* formatted for correct step numbering
* fix typos
* Update _index.md
updates for flannel PR in troubleshooting
* Update _index.md
* Update _index.md
updating a few sections like roadmap, services, troubleshooting/filing tickets
* Update _index.md
* Update _index.md
* Update _index.md
* Fixed a few whitespace issues
* Update _index.md
* Update _index.md
* Update _index.md
* add section on upgrading CoreDNS (#12909)
* documentation for kubelet resource metrics endpoint (#12934)
* windows docs updates for 1.14 (#13279)
* Delete sample-l2bridge-wincni-config.json
this file is not used anywhere
* Update _index.md
* Update _index.md
* Update _index.md
* Update _index.md
* Update _index.md
* Rename content/en/docs/getting-started-guides/windows/_index.md to content/en/docs/setup/windows/_index.md
moving to new location
* Delete flannel-master-kubectl-get-ds.png
* Delete flannel-master-kubeclt-get-pods.png
* Delete windows-docker-error.png
* Add files via upload
* Rename _index.md to add-windows-nodes.md
* Create _index.md
* Update _index.md
* Update add-windows-nodes.md
* Update add-windows-nodes.md
* Create user-guide-windows-nodes.md
* Create user-guide-windows-containers.md
* Update and rename add-windows-nodes.md to intro-windows-nodes.md
* Update user-guide-windows-containers.md
* Rename intro-windows-nodes.md to intro-windows-in-kubernetes.md
* Update user-guide-windows-nodes.md
* Update user-guide-windows-containers.md
* Update user-guide-windows-containers.md
* Update user-guide-windows-nodes.md
* Update user-guide-windows-containers.md
* Update _index.md
* Update intro-windows-in-kubernetes.md
* Update intro-windows-in-kubernetes.md
fixing the pause image
* Update intro-windows-in-kubernetes.md
changing tables from html to MD
* Update user-guide-windows-nodes.md
converting tables from HTML to MD
* Update intro-windows-in-kubernetes.md
* Update user-guide-windows-nodes.md
* Update user-guide-windows-nodes.md
* Update user-guide-windows-nodes.md
updating the numbering , even though it messes up the notes a little bit. Jim will file a ticket to follow up
* Update user-guide-windows-nodes.md
* update to windows docs for 1.14 (#13322)
* Update intro-windows-in-kubernetes.md
* Update intro-windows-in-kubernetes.md
* Update intro-windows-in-kubernetes.md
* Update intro-windows-in-kubernetes.md
* Update intro-windows-in-kubernetes.md
* Update user-guide-windows-containers.md
* Update user-guide-windows-nodes.md
* Update intro-windows-in-kubernetes.md (#13344)
* server side apply followup (#13321)
* change some parts of serverside apply docs in response to comments
* fix typos and wording
* Update config.toml (#13365)
2019-03-25 22:06:16 +00:00
|
|
|
kubectl apply -f https://k8s.io/examples/debug/node-problem-detector.yaml
|
2017-03-15 00:25:29 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
### Addon Pod
|
|
|
|
|
|
|
|
This is for those who have their own cluster bootstrap solution, and don't need
|
|
|
|
to overwrite the default configuration. They could leverage the addon pod to
|
|
|
|
further automate the deployment.
|
|
|
|
|
|
|
|
Just create `node-problem-detector.yaml`, and put it under the addon pods directory
|
|
|
|
`/etc/kubernetes/addons/node-problem-detector` on master node.
|
|
|
|
|
|
|
|
## Overwrite the Configuration
|
|
|
|
|
|
|
|
The [default configuration](https://github.com/kubernetes/node-problem-detector/tree/v0.1/config)
|
2019-01-14 02:26:49 +00:00
|
|
|
is embedded when building the Docker image of node problem detector.
|
2017-03-15 00:25:29 +00:00
|
|
|
|
2018-01-05 02:05:27 +00:00
|
|
|
However, you can use [ConfigMap](/docs/tasks/configure-pod-container/configure-pod-configmap/) to overwrite it
|
2017-03-15 00:25:29 +00:00
|
|
|
following the steps:
|
|
|
|
|
|
|
|
* **Step 1:** Change the config files in `config/`.
|
2018-10-30 16:45:56 +00:00
|
|
|
* **Step 2:** Create the ConfigMap `node-problem-detector-config` with `kubectl create configmap
|
2017-03-15 00:25:29 +00:00
|
|
|
node-problem-detector-config --from-file=config/`.
|
|
|
|
* **Step 3:** Change the `node-problem-detector.yaml` to use the ConfigMap:
|
|
|
|
|
2018-07-02 18:17:19 +00:00
|
|
|
{{< codenew file="debug/node-problem-detector-configmap.yaml" >}}
|
2018-04-07 14:07:08 +00:00
|
|
|
|
2017-03-15 00:25:29 +00:00
|
|
|
|
|
|
|
* **Step 4:** Re-create the node problem detector with the new yaml file:
|
|
|
|
|
|
|
|
```shell
|
2018-07-02 18:17:19 +00:00
|
|
|
kubectl delete -f https://k8s.io/examples/debug/node-problem-detector.yaml # If you have a node-problem-detector running
|
Official 1.14 Release Docs (#13174)
* Official documentation on Poseidon/Firmament, a new multi-scheduler support for K8S. (#11752)
* Added documentation about Poseidon-Firmament scheduler
* Fixed some style issues.
* Udpated the document as per the review comments.
* Fixed some typos and updated the document
* Updated the document as per the review comments.
* Document timeout attribute for kms-plugin. (#12158)
See 72540.
* Official documentation on Poseidon/Firmament, a new multi-scheduler (#12343)
* Removed the old version of the Poseidon documentation. Incorrect location.
* Official documentation on Poseidon/Firmament, a new multi-scheduler support for K8S (#12069)
* Official documentation on Poseidon/Firmament, a new multi-scheduler support for K8S. (#11752)
* Added documentation about Poseidon-Firmament scheduler
* Fixed some style issues.
* Udpated the document as per the review comments.
* Fixed some typos and updated the document
* Updated the document as per the review comments.
* Updated the document as per review comments. Added config details.
* Updated the document as per the latest review comments. Fixed nits
* Made changes as per latest suggestions.
* Some more changes added.
* Updated as per suggestions.
* Changed the release process section.
* SIG Docs edits
Small edits to match style guidelines.
* add plus to feature state
* capitalization
* revert feature state shortcode
since this is a Kubernetes extension, not a direct feature, it shouldn't use the regular feature state tagging.
(cherry picked from commit 7730c1540b637be74b9b21d4128a145994eb19cc)
* Remove initializers from doc. It will be removed in 1.14 (#12331)
* kubeadm: Document CRI auto detection functionality (#12462)
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
* Minor doc change for GAing Pod DNS Config (#12514)
* Graduate ExpandInUsePersistentVolumes feature to beta (#10574)
* Rename 2018-11-07-grpc-load-balancing-with-linkerd.md.md file (#12594)
* Add dynamic percentage of node scoring to user docs (#12235)
* Add dynamic percentage of node scoring to user docs
* addressed review comments
* delete special symbol (#12445)
* Update documentation for VolumeSubpathEnvExpansion (#11843)
* Update documentation for VolumeSubpathEnvExpansion
* Address comments - improve descriptions
* Graduate Pod Priority and Preemption to GA (#12428)
* Added Instana links to the documentation (#12977)
* Added link to the Instana Kubernetes integration
* Added Instana link for services section
Added Instana and a link to the Kubernetes integration to the analytics services section and broadened the scope to APM, monitoring and analytics.
* Oxford comma /flex
* More Oxford commas, because they matter
* Update kubectl plugins to stable (#12847)
* documentation for CSI topology beta (#12889)
* Document changes to default RBAC discovery ClusterRole(Binding)s (#12888)
* Document changes to default RBAC discovery ClusterRole(Binding)s
Documentation for https://github.com/kubernetes/enhancements/issues/789 and https://github.com/kubernetes/kubernetes/pull/73807
* documentation review feedback
* CSI raw block to beta (#12931)
* Change incorrect string raw to block (#12926)
Fixes #12925
* Update documentation on node OS/arch labels (#12976)
These labels have been promoted to GA:
https://github.com/kubernetes/enhancements/issues/793
* local pv GA doc updates (#12915)
* Publish CRD OpenAPI Documentation (#12910)
* add documentation for CustomResourcePublishOpenAPI
* address comments
fix links, ordered lists, style and typo
* kubeadm: add document for upgrading from 1.13 to 1.14 (single CP and HA) (#13189)
* kubeadm: add document for upgrading from 1.13 to 1.14
- remove doc for upgrading 1.10 -> 1.11
* kubeadm: apply amends to upgrade-1.14 doc
* kubeadm: apply amends to upgrade-1.14 doc (part2)
* kubeadm: apply amends to upgrade-1.14 doc (part3)
* kubeadm: add note about "upgrade node experimental-control-plane"
+ add comment about `upgrade plan`
* kubeadm: add missing "You should see output similar to this"
* fix bullet indentation (#13214)
* mark PodReadinessGate GA (#12800)
* Update RuntimeClass documentation for beta (#13043)
* Update RuntimeClass documentation for beta
* Update feature gate & add upgrade section
* formatting fixes
* Highlight upgrade action required
* Address feedback
* CSI ephemeral volume alpha documentation (#10934)
* update kubectl documentation (#12867)
* update kubectl documentation
* add document for Secret/ConfigMap generators
* replace `kubectl create -f` by `kubectl apply -f`
* Add page for kustomization support in kubectl
* fix spelling errors and address comments
* Documentation for Windows GMSA feature (#12936)
* Documentation for Windows GMSA feature
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Enhancements to GMSA docs
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Fix links
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Fix GMSA link
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Add GMSA feature flag in feature flag list
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Relocate GMSA to container configuration
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Add example for container spec
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Remove changes in Windows index
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Update configure-gmsa.md
* Update configure-gmsa.md
* Update configure-gmsa.md
* Update configure-gmsa.md
* Rearrange the steps into two sections and other edits
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Fix links
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Add reference to script to generate GMSA YAMLs
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Some more clarifications for GMSA
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* HugePages graduated to GA (#13004)
* HugePages graduated to GA
* fixing nit for build
* Docs for node PID limiting (https://github.com/kubernetes/kubernetes/pull/73651) (#12932)
* kubeadm: update the reference documentation for 1.14 (#12911)
* kubeadm: update list of generated files for 1.14
NOTE: PLACEHOLDERS! these files are generated by SIG Docs each
release, but we need them to pass the k/website PR CI.
- add join_phase* (new sub phases of join)
- add init_phase_upload-certs.md (new upload certs phase for init)
- remove alpha-preflight (now both init and join have this)
* kubeadm: update reference docs includes for 1.14
- remove includes from alpha.md
- add upload-certs to init-phase.md
- add join-phase.md and it's phases
* kubeadm: update the editorial content of join and init
- cleanup master->control-plane node
- add some notes about phases and join
- remove table about pre-pulling images
- remove outdated info about self-hosting
* kubeadm: update target release for v1alpha3 removal
1.14 -> 1.15
* kubeadm: copy edits for 1.14 reference docs (part1)
* kubeadm: use "shell" for code blocks
* kubeadm: update the 1.14 HA guide (#13191)
* kubeadm: update the 1.14 HA guide
* kubeadm: try to fix note/caution indent in HA page
* kubeadm: fix missing sudo and minor amends in HA doc
* kubeadm: apply latest amends to the HA doc for 1.14
* fixed a few missed merge conflicts
* Admission Webhook new features doc (#12938)
- kubernetes/kubernetes#74998
- kubernetes/kubernetes#74477
- kubernetes/kubernetes#74562
* Clarifications and fixes in GMSA doc (#13226)
* Clarifications and fixes in GMSA doc
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Update configure-gmsa.md
* Reformat to align headings and pre-reqs better
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Reformat to align headings and pre-reqs better
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Reformat to fix bullets
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Reword application of sample gmsa
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Update configure-gmsa.md
* Address feedback to use active voice
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Address feedback to use active voice
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* RunAsGroup documentation for Progressing this to Beta (#12297)
* start serverside-apply documentation (#13077)
* start serverside-apply documentation
* add more concept info on server side apply
* Update api concepts
* Update api-concepts.md
* fix style issues
* Document CSI update (#12928)
* Document CSI update
* Finish CSI documentation
Also fix mistake with ExpandInUsePersistentVolumes documented as beta
* Overall docs for CSI Migration feature (#12935)
* Placeholder docs for CSI Migration feature
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Address CR comments and update feature gates
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Add mappings for CSI plugins
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Add sections for AWS and GCE PD migration
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Add docs for Cinder and CSI Migration info
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Clarify scope to volumes with file system
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Change the format of EBS and Cinder CSI Migration sections to follow the GCE template
Signed-off-by: Deep Debroy <ddebroy@docker.com>
* Windows documentation updates for 1.14 (#12929)
* Updated the note to indicate doc work for 1.14
* first attempt at md export from gdoc
* simplifyig
* big attempt
* moving DRAFT windows content to PR for review
* moving content to PR in markdown for review
* updated note tags
* Delete windows-contributing.md
deleting this file as it is already ported to the github contributor guide
* fixed formatting in intro and cluster setup guide
* updating formatting for running containers guide
* rejiggered end of troubleshooting
* fixed minor typos
* Clarified the windows binary download step
* Update _index.md
making updates based on feedback
* Update _index.md
updating ovn-kubernetes docs
* Update _index.md
* Update _index.md
* updating relative docs links
updating all the links to be relative links to /docs
* Update _index.md
* Update _index.md
updates for windows services and ovn-kubernetes
* formatted for correct step numbering
* fix typos
* Update _index.md
updates for flannel PR in troubleshooting
* Update _index.md
* Update _index.md
updating a few sections like roadmap, services, troubleshooting/filing tickets
* Update _index.md
* Update _index.md
* Update _index.md
* Fixed a few whitespace issues
* Update _index.md
* Update _index.md
* Update _index.md
* add section on upgrading CoreDNS (#12909)
* documentation for kubelet resource metrics endpoint (#12934)
* windows docs updates for 1.14 (#13279)
* Delete sample-l2bridge-wincni-config.json
this file is not used anywhere
* Update _index.md
* Update _index.md
* Update _index.md
* Update _index.md
* Update _index.md
* Rename content/en/docs/getting-started-guides/windows/_index.md to content/en/docs/setup/windows/_index.md
moving to new location
* Delete flannel-master-kubectl-get-ds.png
* Delete flannel-master-kubeclt-get-pods.png
* Delete windows-docker-error.png
* Add files via upload
* Rename _index.md to add-windows-nodes.md
* Create _index.md
* Update _index.md
* Update add-windows-nodes.md
* Update add-windows-nodes.md
* Create user-guide-windows-nodes.md
* Create user-guide-windows-containers.md
* Update and rename add-windows-nodes.md to intro-windows-nodes.md
* Update user-guide-windows-containers.md
* Rename intro-windows-nodes.md to intro-windows-in-kubernetes.md
* Update user-guide-windows-nodes.md
* Update user-guide-windows-containers.md
* Update user-guide-windows-containers.md
* Update user-guide-windows-nodes.md
* Update user-guide-windows-containers.md
* Update _index.md
* Update intro-windows-in-kubernetes.md
* Update intro-windows-in-kubernetes.md
fixing the pause image
* Update intro-windows-in-kubernetes.md
changing tables from html to MD
* Update user-guide-windows-nodes.md
converting tables from HTML to MD
* Update intro-windows-in-kubernetes.md
* Update user-guide-windows-nodes.md
* Update user-guide-windows-nodes.md
* Update user-guide-windows-nodes.md
updating the numbering , even though it messes up the notes a little bit. Jim will file a ticket to follow up
* Update user-guide-windows-nodes.md
* update to windows docs for 1.14 (#13322)
* Update intro-windows-in-kubernetes.md
* Update intro-windows-in-kubernetes.md
* Update intro-windows-in-kubernetes.md
* Update intro-windows-in-kubernetes.md
* Update intro-windows-in-kubernetes.md
* Update user-guide-windows-containers.md
* Update user-guide-windows-nodes.md
* Update intro-windows-in-kubernetes.md (#13344)
* server side apply followup (#13321)
* change some parts of serverside apply docs in response to comments
* fix typos and wording
* Update config.toml (#13365)
2019-03-25 22:06:16 +00:00
|
|
|
kubectl apply -f https://k8s.io/examples/debug/node-problem-detector-configmap.yaml
|
2017-03-15 00:25:29 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
***Notice that this approach only applies to node problem detector started with `kubectl`.***
|
|
|
|
|
|
|
|
For node problem detector running as cluster addon, because addon manager doesn't support
|
|
|
|
ConfigMap, configuration overwriting is not supported now.
|
|
|
|
|
|
|
|
## Kernel Monitor
|
|
|
|
|
|
|
|
*Kernel Monitor* is a problem daemon in node problem detector. It monitors kernel log
|
|
|
|
and detects known kernel issues following predefined rules.
|
|
|
|
|
|
|
|
The Kernel Monitor matches kernel issues according to a set of predefined rule list in
|
|
|
|
[`config/kernel-monitor.json`](https://github.com/kubernetes/node-problem-detector/blob/v0.1/config/kernel-monitor.json).
|
2017-03-20 06:23:19 +00:00
|
|
|
The rule list is extensible, and you can always extend it by overwriting the
|
|
|
|
configuration.
|
2017-03-15 00:25:29 +00:00
|
|
|
|
|
|
|
### Add New NodeConditions
|
|
|
|
|
|
|
|
To support new node conditions, you can extend the `conditions` field in
|
|
|
|
`config/kernel-monitor.json` with new condition definition:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"type": "NodeConditionType",
|
|
|
|
"reason": "CamelCaseDefaultNodeConditionReason",
|
|
|
|
"message": "arbitrary default node condition message"
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
### Detect New Problems
|
|
|
|
|
|
|
|
To detect new problems, you can extend the `rules` field in `config/kernel-monitor.json`
|
|
|
|
with new rule definition:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"type": "temporary/permanent",
|
|
|
|
"condition": "NodeConditionOfPermanentIssue",
|
|
|
|
"reason": "CamelCaseShortReason",
|
|
|
|
"message": "regexp matching the issue in the kernel log"
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
### Change Log Path
|
|
|
|
|
|
|
|
Kernel log in different OS distros may locate in different path. The `log`
|
|
|
|
field in `config/kernel-monitor.json` is the log path inside the container.
|
|
|
|
You can always configure it to match your OS distro.
|
|
|
|
|
|
|
|
### Support Other Log Format
|
|
|
|
|
|
|
|
Kernel monitor uses [`Translator`](https://github.com/kubernetes/node-problem-detector/blob/v0.1/pkg/kernelmonitor/translator/translator.go)
|
|
|
|
plugin to translate kernel log the internal data structure. It is easy to
|
|
|
|
implement a new translator for a new log format.
|
|
|
|
|
2018-06-22 18:20:04 +00:00
|
|
|
{{% /capture %}}
|
|
|
|
|
|
|
|
{{% capture discussion %}}
|
|
|
|
|
2017-03-15 00:25:29 +00:00
|
|
|
## Caveats
|
|
|
|
|
|
|
|
It is recommended to run the node problem detector in your cluster to monitor
|
|
|
|
the node health. However, you should be aware that this will introduce extra
|
|
|
|
resource overhead on each node. Usually this is fine, because:
|
|
|
|
|
|
|
|
* The kernel log is generated relatively slowly.
|
|
|
|
* Resource limit is set for node problem detector.
|
|
|
|
* Even under high load, the resource usage is acceptable.
|
2017-07-25 16:37:19 +00:00
|
|
|
(see [benchmark result](https://github.com/kubernetes/node-problem-detector/issues/2#issuecomment-220255629))
|
2018-06-22 18:20:04 +00:00
|
|
|
|
2018-07-02 18:17:19 +00:00
|
|
|
{{% /capture %}}
|