Document control plane monitoring (#17578)
* Document control plane monitoring * Update content/en/docs/concepts/cluster-administration/monitoring.md Co-Authored-By: Tim Bannister <tim@scalefactory.com> * Update content/en/docs/concepts/cluster-administration/monitoring.md Co-Authored-By: Tim Bannister <tim@scalefactory.com> * Merge controller-metrics.md into monitoring.md Co-authored-by: Tim Bannister <tim@scalefactory.com>pull/19058/head
parent
d43ac11196
commit
26827a909e
|
@ -1,50 +0,0 @@
|
|||
---
|
||||
title: Controller manager metrics
|
||||
content_template: templates/concept
|
||||
weight: 100
|
||||
---
|
||||
|
||||
{{% capture overview %}}
|
||||
Controller manager metrics provide important insight into the performance and health of
|
||||
the controller manager.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture body %}}
|
||||
## What are controller manager metrics
|
||||
|
||||
Controller manager metrics provide important insight into the performance and health of the controller manager.
|
||||
These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as
|
||||
etcd request latencies or Cloudprovider (AWS, GCE, OpenStack) API latencies that can be used
|
||||
to gauge the health of a cluster.
|
||||
|
||||
Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and OpenStack.
|
||||
These metrics can be used to monitor health of persistent volume operations.
|
||||
|
||||
For example, for GCE these metrics are called:
|
||||
|
||||
```
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Configuration
|
||||
|
||||
|
||||
In a cluster, controller-manager metrics are available from `http://localhost:10252/metrics`
|
||||
from the host where the controller-manager is running.
|
||||
|
||||
The metrics are emitted in [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and are human readable.
|
||||
|
||||
In a production environment you may want to configure prometheus or some other metrics scraper
|
||||
to periodically gather these metrics and make them available in some kind of time series database.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
|
|
@ -0,0 +1,132 @@
|
|||
---
|
||||
title: Metrics For The Kubernetes Control Plane
|
||||
reviewers:
|
||||
- brancz
|
||||
- logicalhan
|
||||
- RainbowMango
|
||||
content_template: templates/concept
|
||||
weight: 60
|
||||
aliases:
|
||||
- controller-metrics.md
|
||||
---
|
||||
|
||||
{{% capture overview %}}
|
||||
|
||||
System component metrics can give a better look into what is happening inside them. Metrics are particularly useful for building dashboards and alerts.
|
||||
|
||||
Metrics in Kubernetes control plane are emitted in [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and are human readable.
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture body %}}
|
||||
|
||||
## Metrics in Kubernetes
|
||||
|
||||
In most cases metrics are available on `/metrics` endpoint of the HTTP server. For components that doesn't expose endpoint by default it can be enabled using `--bind-address` flag.
|
||||
|
||||
Examples of those components:
|
||||
* {{< glossary_tooltip term_id="kube-controller-manager" text="kube-controller-manager" >}}
|
||||
* {{< glossary_tooltip term_id="kube-proxy" text="kube-proxy" >}}
|
||||
* {{< glossary_tooltip term_id="kube-apiserver" text="kube-apiserver" >}}
|
||||
* {{< glossary_tooltip term_id="kube-scheduler" text="kube-scheduler" >}}
|
||||
* {{< glossary_tooltip term_id="kubelet" text="kubelet" >}}
|
||||
|
||||
In a production environment you may want to configure [Prometheus Server](https://prometheus.io/) or some other metrics scraper
|
||||
to periodically gather these metrics and make them available in some kind of time series database.
|
||||
|
||||
Note that {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} also exposes metrics in `/metrics/cadvisor`, `/metrics/resource` and `/metrics/probes` endpoints. Those metrics do not have same lifecycle.
|
||||
|
||||
If your cluster uses {{< glossary_tooltip term_id="rbac" text="RBAC" >}}, reading metrics requires authorization via a user, group or ServiceAccount with a ClusterRole that allows accessing `/metrics`.
|
||||
For example:
|
||||
```
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: prometheus
|
||||
rules:
|
||||
- nonResourceURLs:
|
||||
- "/metrics"
|
||||
verbs:
|
||||
- get
|
||||
```
|
||||
|
||||
## Metric lifecycle
|
||||
|
||||
Alpha metric → Stable metric → Deprecated metric → Hidden metric → Deletion
|
||||
|
||||
Alpha metrics have no stability guarantees; as such they can be modified or deleted at any time.
|
||||
|
||||
Stable metrics can be guaranteed to not change; Specifically, stability means:
|
||||
|
||||
* the metric itself will not be deleted (or renamed)
|
||||
* the type of metric will not be modified
|
||||
|
||||
Deprecated metric signal that the metric will eventually be deleted; to find which version, you need to check annotation, which includes from which kubernetes version that metric will be considered deprecated.
|
||||
|
||||
Before deprecation:
|
||||
|
||||
```
|
||||
# HELP some_counter this counts things
|
||||
# TYPE some_counter counter
|
||||
some_counter 0
|
||||
```
|
||||
|
||||
After deprecation:
|
||||
|
||||
```
|
||||
# HELP some_counter (Deprecated since 1.15.0) this counts things
|
||||
# TYPE some_counter counter
|
||||
some_counter 0
|
||||
```
|
||||
|
||||
Once a metric is hidden then by default the metrics is not published for scraping. To use a hidden metric, you need to override the configuration for the relevant cluster component.
|
||||
|
||||
Once a metric is deleted, the metric is not published. You cannot change this using an override.
|
||||
|
||||
|
||||
## Show Hidden Metrics
|
||||
|
||||
As described above, admins can enable hidden metrics through a command-line flag on a specific binary. This intends to be used as an escape hatch for admins if they missed the migration of the metrics deprecated in the last release.
|
||||
|
||||
The flag `show-hidden-metrics-for-version` takes a version for which you want to show metrics deprecated in that release. The version is expressed as x.y, where x is the major version, y is the minor version. The patch version is not needed even though a metrics can be deprecated in a patch release, the reason for that is the metrics deprecation policy runs against the minor release.
|
||||
|
||||
The flag can only take the previous minor version as it's value. All metrics hidden in previous will be emitted if admins set the previous version to `show-hidden-metrics-for-version`. The too old version is not allowed because this violates the metrics deprecated policy.
|
||||
|
||||
Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. According to metrics deprecated policy, we can reach the following conclusion:
|
||||
|
||||
* In release `1.n`, the metric is deprecated, and it can be emitted by default.
|
||||
* In release `1.n+1`, the metric is hidden by default and it can be emitted by command line `show-hidden-metrics-for-version=1.n`.
|
||||
* In release `1.n+2`, the metric should be removed from the codebase. No escape hatch anymore.
|
||||
|
||||
If you're upgrading from release `1.12` to `1.13`, but still depend on a metric `A` deprecated in `1.12`, you should set hidden metrics via command line: `--show-hidden-metrics=1.12` and remember to remove this metric dependency before upgrading to `1.14`
|
||||
|
||||
## Component metrics
|
||||
|
||||
### kube-controller-manager metrics
|
||||
|
||||
Controller manager metrics provide important insight into the performance and health of the controller manager.
|
||||
These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as
|
||||
etcd request latencies or Cloudprovider (AWS, GCE, OpenStack) API latencies that can be used
|
||||
to gauge the health of a cluster.
|
||||
|
||||
Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and OpenStack.
|
||||
These metrics can be used to monitor health of persistent volume operations.
|
||||
|
||||
For example, for GCE these metrics are called:
|
||||
|
||||
```
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
|
||||
cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
|
||||
```
|
||||
|
||||
{{% /capture %}}
|
||||
|
||||
{{% capture whatsnext %}}
|
||||
* Read about the [Prometheus text format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format) for metrics
|
||||
* See the list of [stable Kubernetes metrics](https://github.com/kubernetes/kubernetes/blob/master/test/instrumentation/testdata/stable-metrics-list.yaml)
|
||||
* Read about the [Kubernetes deprecation policy](https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-a-feature-or-behavior )
|
||||
{{% /capture %}}
|
|
@ -116,13 +116,13 @@ toc:
|
|||
- docs/concepts/cluster-administration/networking.md
|
||||
- docs/concepts/cluster-administration/network-plugins.md
|
||||
- docs/concepts/cluster-administration/logging.md
|
||||
- docs/concepts/cluster-administration/monitoring.md
|
||||
- docs/concepts/cluster-administration/kubelet-garbage-collection.md
|
||||
- docs/concepts/cluster-administration/federation.md
|
||||
- docs/concepts/cluster-administration/sysctl-cluster.md
|
||||
- docs/concepts/cluster-administration/authenticate-across-clusters-kubeconfig.md
|
||||
- docs/concepts/cluster-administration/master-node-communication.md
|
||||
- docs/concepts/cluster-administration/proxies.md
|
||||
- docs/concepts/cluster-administration/controller-metrics.md
|
||||
- docs/concepts/cluster-administration/device-plugins.md
|
||||
- title: Policies
|
||||
section:
|
||||
|
|
Loading…
Reference in New Issue