Merge pull request #4138 from gnufied/controller-metrics

Add documentation about controller metrics
2017-06-26 11:12:43 -07:00 · 2017-06-26 11:12:43 -07:00 · 3d417575df
parent fbbd8922a4 99ff364e61
commit 3d417575df
2 changed files with 49 additions and 0 deletions
--- a/_data/concepts.yml
+++ b/_data/concepts.yml
@ -82,6 +82,7 @@ toc:
  - docs/concepts/cluster-administration/authenticate-across-clusters-kubeconfig.md
  - docs/concepts/cluster-administration/master-node-communication.md
  - docs/concepts/cluster-administration/proxies.md
+  - docs/concepts/cluster-administration/controller-metrics.md
  - title: Policies
    section:
    - docs/concepts/policy/resource-quotas.md
--- a/docs/concepts/cluster-administration/controller-metrics.md
+++ b/docs/concepts/cluster-administration/controller-metrics.md
@ -0,0 +1,48 @@
+---
+title: Controller manager metrics
+---
+
+{% capture overview %}
+Controller manager metrics provide important insight into the performance and health of
+the controller manager.
+
+{% endcapture %}
+
+{% capture body %}
+## What are controller manager metrics
+
+Controller manager metrics provide important insight into the performance and health of the controller manager.
+These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as
+etcd request latencies or Cloudprovider (AWS, GCE, Openstack) API latencies that can be used
+to gauge the health of a cluster.
+
+Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and Openstack.
+These metrics can be used to monitor health of persistent volume operations.
+
+For example, for GCE these metrics are called:
+
+```
+cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
+cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
+cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
+cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
+cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
+cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
+```
+
+
+
+## Configuration
+
+
+In a cluster, controller-manager metrics are available from `http://localhost:10252/metrics`
+from the host where the controller-manager is running.
+
+The metrics are emitted in [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and are human readable.
+
+In a production environment you may want to configure prometheus or some other metrics scraper
+to periodically gather these metrics and make them available in some kind of time series database.
+
+{% endcapture %}
+
+{% include templates/concept.md %}