1.2 changes for admin/
parent
f28d0f9eb5
commit
97c728d343
|
@ -149,6 +149,8 @@ toc:
|
|||
path: /docs/user-guide/downward-api/
|
||||
- title: Updating Live Pods
|
||||
path: /docs/user-guide/update-demo/
|
||||
- title: Static Pods
|
||||
path: /docs/admin/static-pods/
|
||||
- title: Installing a Kubernetes Master Node via Docker
|
||||
path: /docs/getting-started-guides/docker-multinode/master/
|
||||
- title: Adding a Kubernetes Worker Node via Docker
|
||||
|
@ -174,6 +176,10 @@ toc:
|
|||
path: /docs/user-guide/connecting-to-applications-port-forward/
|
||||
- title: Configuring Your Cloud Provider's Firewalls
|
||||
path: /docs/user-guide/services-firewalls/
|
||||
- title: Master <-> Node Communication
|
||||
path: /docs/admin/master-node-communication/
|
||||
- title: Network Plugins
|
||||
path: /docs/admin/network-plugins/
|
||||
|
||||
- title: Configuring Kubernetes
|
||||
section:
|
||||
|
|
|
@ -56,9 +56,7 @@ variety of uses cases:
|
|||
2. Processes running in Containers on Kubernetes that need to read from
|
||||
the apiserver. Currently, these can use a [service account](/docs/user-guide/service-accounts).
|
||||
3. Scheduler and Controller-manager processes, which need to do read-write
|
||||
API operations. Currently, these have to run on the same host as the
|
||||
apiserver and use the Localhost Port. In the future, these will be
|
||||
switched to using service accounts to avoid the need to be co-located.
|
||||
API operations, using service accounts to avoid the need to be co-located.
|
||||
4. Kubelets, which need to do read-write API operations and are necessarily
|
||||
on different machines than the apiserver. Kubelet uses the Secure Port
|
||||
to get their pods, to find the services that a pod can see, and to
|
||||
|
@ -69,7 +67,7 @@ variety of uses cases:
|
|||
## Expected changes
|
||||
|
||||
- Policy will limit the actions kubelets can do via the authed port.
|
||||
- Scheduler and Controller-manager will use the Secure Port too. They
|
||||
will then be able to run on different machines than the apiserver.
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -37,6 +37,16 @@ ordered list of admission control choices to invoke prior to modifying objects i
|
|||
|
||||
Use this plugin by itself to pass-through all requests.
|
||||
|
||||
### AlwaysPullImages
|
||||
|
||||
This plug-in modifies every new Pod to force the image pull policy to Always. This is useful in a
|
||||
multitenant cluster so that users can be assured that their private images can only be used by those
|
||||
who have the credentials to pull them. Without this plug-in, once an image has been pulled to a
|
||||
node, any pod from any user can use it simply by knowing the image's name (assuming the Pod is
|
||||
scheduled onto the right node), without any authorization check against the image. When this plug-in
|
||||
is enabled, images are always pulled prior to starting containers, which means valid credentials are
|
||||
required.
|
||||
|
||||
### AlwaysDeny
|
||||
|
||||
Rejects all requests. Used for testing.
|
||||
|
@ -117,7 +127,7 @@ We strongly recommend `NamespaceLifecycle` over `NamespaceAutoProvision`.
|
|||
### NamespaceLifecycle
|
||||
|
||||
This plug-in enforces that a `Namespace` that is undergoing termination cannot have new objects created in it,
|
||||
and ensures that requests in a non-existant `Namespace` are rejected.
|
||||
and ensures that requests in a non-existent `Namespace` are rejected.
|
||||
|
||||
A `Namespace` deletion kicks off a sequence of operations that remove all objects (pods, services, etc.) in that
|
||||
namespace. In order to enforce integrity of that process, we strongly recommend running this plug-in.
|
||||
|
|
|
@ -14,7 +14,12 @@ to apiserver. Currently, tokens last indefinitely, and the token list cannot
|
|||
be changed without restarting apiserver.
|
||||
|
||||
The token file format is implemented in `plugin/pkg/auth/authenticator/token/tokenfile/...`
|
||||
and is a csv file with 3 columns: token, user name, user uid.
|
||||
and is a csv file with a minimum of 3 columns: token, user name, user uid, followed by
|
||||
optional group names. Note, if you have more than one group the column must be double quoted e.g.
|
||||
|
||||
```conf
|
||||
token,user,uid,"group1,group2,group3"
|
||||
```
|
||||
|
||||
When using token authentication from an http client the apiserver expects an `Authorization`
|
||||
header with a value of `Bearer SOMETOKEN`.
|
||||
|
@ -22,13 +27,15 @@ header with a value of `Bearer SOMETOKEN`.
|
|||
**OpenID Connect ID Token** is enabled by passing the following options to the apiserver:
|
||||
- `--oidc-issuer-url` (required) tells the apiserver where to connect to the OpenID provider. Only HTTPS scheme will be accepted.
|
||||
- `--oidc-client-id` (required) is used by apiserver to verify the audience of the token.
|
||||
A valid [ID token](http://openid.net/specs/openid-connect-core-1_0/#IDToken) MUST have this
|
||||
A valid [ID token](http://openid.net/specs/openid-connect-core-1_0.html#IDToken) MUST have this
|
||||
client-id in its `aud` claims.
|
||||
- `--oidc-ca-file` (optional) is used by apiserver to establish and verify the secure connection
|
||||
to the OpenID provider.
|
||||
- `--oidc-username-claim` (optional, experimental) specifies which OpenID claim to use as the user name. By default, `sub`
|
||||
will be used, which should be unique and immutable under the issuer's domain. Cluster administrator can
|
||||
choose other claims such as `email` to use as the user name, but the uniqueness and immutability is not guaranteed.
|
||||
- `--oidc-groups-claim` (optional, experimental) the name of a custom OpenID Connect claim for specifying user groups. The claim
|
||||
value is expected to be an array of strings.
|
||||
|
||||
Please note that this flag is still experimental until we settle more on how to handle the mapping of the OpenID user to the Kubernetes user. Thus further changes are possible.
|
||||
|
||||
|
|
|
@ -16,26 +16,29 @@ The following implementations are available, and are selected by flag:
|
|||
- `--authorization-mode=AlwaysDeny`
|
||||
- `--authorization-mode=AlwaysAllow`
|
||||
- `--authorization-mode=ABAC`
|
||||
- `--authorization-mode=Webhook`
|
||||
|
||||
`AlwaysDeny` blocks all requests (used in tests).
|
||||
`AlwaysAllow` allows all requests; use if you don't need authorization.
|
||||
`ABAC` allows for user-configured authorization policy. ABAC stands for Attribute-Based Access Control.
|
||||
`Webhook` allows for authorization to be driven by a remote service using REST.
|
||||
|
||||
## ABAC Mode
|
||||
|
||||
### Request Attributes
|
||||
|
||||
A request has 5 attributes that can be considered for authorization:
|
||||
|
||||
A request has the following attributes that can be considered for authorization:
|
||||
- user (the user-string which a user was authenticated as).
|
||||
- group (the list of group names the authenticated user is a member of).
|
||||
- whether the request is readonly (GETs are readonly).
|
||||
- what resource is being accessed.
|
||||
- applies only to the API endpoints, such as
|
||||
`/api/v1/namespaces/default/pods`. For miscellaneous endpoints, like `/version`, the
|
||||
resource is the empty string.
|
||||
- the namespace of the object being access, or the empty string if the
|
||||
endpoint does not support namespaced objects.
|
||||
- whether the request is for an API resource.
|
||||
- the request path.
|
||||
- allows authorizing access to miscellaneous endpoints like `/api` or `/healthz` (see [kubectl](#kubectl)).
|
||||
- the request verb.
|
||||
- API verbs like `get`, `list`, `create`, `update`, `watch`, `delete`, and `deletecollection` are used for API requests
|
||||
- HTTP verbs like `get`, `post`, `put`, and `delete` are used for non-API requests
|
||||
- what resource is being accessed (for API requests only)
|
||||
- the namespace of the object being accessed (for namespaced API requests only)
|
||||
- the API group being accessed (for API requests only)
|
||||
|
||||
We anticipate adding more attributes to allow finer grained access control and
|
||||
to assist in policy management.
|
||||
|
@ -48,19 +51,29 @@ The file format is [one JSON object per line](http://jsonlines.org/). There sho
|
|||
one map per line.
|
||||
|
||||
Each line is a "policy object". A policy object is a map with the following properties:
|
||||
- Versioning properties:
|
||||
- `apiVersion`, type string; valid values are "abac.authorization.kubernetes.io/v1beta1". Allows versioning and conversion of the policy format.
|
||||
- `kind`, type string: valid values are "Policy". Allows versioning and conversion of the policy format.
|
||||
|
||||
- `user`, type string; the user-string from `--token-auth-file`. If you specify `user`, it must match the username of the authenticated user.
|
||||
- `group`, type string; if you specify `group`, it must match one of the groups of the authenticated user.
|
||||
- `readonly`, type boolean, when true, means that the policy only applies to GET
|
||||
operations.
|
||||
- `resource`, type string; a resource from an URL, such as `pods`.
|
||||
- `namespace`, type string; a namespace string.
|
||||
- `spec` property set to a map with the following properties:
|
||||
- Subject-matching properties:
|
||||
- `user`, type string; the user-string from `--token-auth-file`. If you specify `user`, it must match the username of the authenticated user. `*` matches all requests.
|
||||
- `group`, type string; if you specify `group`, it must match one of the groups of the authenticated user. `*` matches all requests.
|
||||
|
||||
- `readonly`, type boolean, when true, means that the policy only applies to get, list, and watch operations.
|
||||
|
||||
- Resource-matching properties:
|
||||
- `apiGroup`, type string; an API group, such as `extensions`. `*` matches all API groups.
|
||||
- `namespace`, type string; a namespace string. `*` matches all resource requests.
|
||||
- `resource`, type string; a resource, such as `pods`. `*` matches all resource requests.
|
||||
|
||||
- Non-resource-matching properties:
|
||||
- `nonResourcePath`, type string; matches the non-resource request paths (like `/version` and `/apis`). `*` matches all non-resource requests. `/foo/*` matches `/foo/` and all of its subpaths.
|
||||
|
||||
An unset property is the same as a property set to the zero value for its type (e.g. empty string, 0, false).
|
||||
However, unset should be preferred for readability.
|
||||
|
||||
In the future, policies may be expressed in a JSON format, and managed via a REST
|
||||
interface.
|
||||
In the future, policies may be expressed in a JSON format, and managed via a REST interface.
|
||||
|
||||
### Authorization Algorithm
|
||||
|
||||
|
@ -69,21 +82,35 @@ A request has attributes which correspond to the properties of a policy object.
|
|||
When a request is received, the attributes are determined. Unknown attributes
|
||||
are set to the zero value of its type (e.g. empty string, 0, false).
|
||||
|
||||
An unset property will match any value of the corresponding
|
||||
attribute. An unset attribute will match any value of the corresponding property.
|
||||
A property set to "*" will match any value of the corresponding attribute.
|
||||
|
||||
The tuple of attributes is checked for a match against every policy in the policy file.
|
||||
If at least one line matches the request attributes, then the request is authorized (but may fail later validation).
|
||||
|
||||
To permit any user to do something, write a policy with the user property unset.
|
||||
To permit an action Policy with an unset namespace applies regardless of namespace.
|
||||
To permit any user to do something, write a policy with the user property set to "*".
|
||||
To permit a user to do anything, write a policy with the apiGroup, namespace, resource, and nonResourcePath properties set to "*".
|
||||
|
||||
### Kubectl
|
||||
|
||||
Kubectl uses the `/api` and `/apis` endpoints of api-server to negotiate client/server versions. To validate objects sent to the API by create/update operations, kubectl queries certain swagger resources. For API version `v1` those would be `/swaggerapi/api/v1` & `/swaggerapi/experimental/v1`.
|
||||
|
||||
When using ABAC authorization, those special resources have to be explicitly exposed via the `nonResourcePath` property in a policy (see [examples](#examples) below):
|
||||
|
||||
* `/api`, `/api/*`, `/apis`, and `/apis/*` for API version negotiation.
|
||||
* `/version` for retrieving the server version via `kubectl version`.
|
||||
* `/swaggerapi/*` for create/update operations.
|
||||
|
||||
To inspect the HTTP calls involved in a specific kubectl operation you can turn up the verbosity:
|
||||
|
||||
kubectl --v=8 version
|
||||
|
||||
### Examples
|
||||
|
||||
1. Alice can do anything: `{"user":"alice"}`
|
||||
2. Kubelet can read any pods: `{"user":"kubelet", "resource": "pods", "readonly": true}`
|
||||
3. Kubelet can read and write events: `{"user":"kubelet", "resource": "events"}`
|
||||
4. Bob can just read pods in namespace "projectCaribou": `{"user":"bob", "resource": "pods", "readonly": true, "namespace": "projectCaribou"}`
|
||||
1. Alice can do anything to all resources: `{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user": "alice", "namespace": "*", "resource": "*", "apiGroup": "*"}}`
|
||||
2. Kubelet can read any pods: `{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user": "kubelet", "namespace": "*", "resource": "pods", "readonly": true}}`
|
||||
3. Kubelet can read and write events: `{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user": "kubelet", "namespace": "*", "resource": "events"}}`
|
||||
4. Bob can just read pods in namespace "projectCaribou": `{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user": "bob", "namespace": "projectCaribou", "resource": "pods", "readonly": true}}`
|
||||
5. Anyone can make read-only requests to all non-API paths: `{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user": "*", "readonly": true, "nonResourcePath": "*"}}`
|
||||
|
||||
[Complete file example](http://releases.k8s.io/{{page.githubbranch}}/pkg/auth/authorizer/abac/example_policy_file.jsonl)
|
||||
|
||||
|
@ -103,11 +130,116 @@ system:serviceaccount:<namespace>:default
|
|||
For example, if you wanted to grant the default service account in the kube-system full privilege to the API, you would add this line to your policy file:
|
||||
|
||||
```json
|
||||
{"user":"system:serviceaccount:kube-system:default"}
|
||||
{"apiVersion":"abac.authorization.kubernetes.io/v1beta1","kind":"Policy","user":"system:serviceaccount:kube-system:default","namespace":"*","resource":"*","apiGroup":"*"}
|
||||
```
|
||||
|
||||
The apiserver will need to be restarted to pickup the new policy lines.
|
||||
|
||||
## Webhook Mode
|
||||
|
||||
When specified, mode `Webhook` causes Kubernetes to query an outside REST service when determining user privileges.
|
||||
|
||||
### Configuration File Format
|
||||
|
||||
Mode `Webhook` requires a file for HTTP configuration, specify by the `--authorization-webhook-config-file=SOME_FILENAME` flag.
|
||||
|
||||
The configuration file uses the [kubeconfig](/docs/user-guide/kubeconfig-file/) file format. Within the file "users" refers to the API Server webhook and "clusters" refers to the remote service.
|
||||
|
||||
A configuration example which uses HTTPS client auth:
|
||||
|
||||
```yaml
|
||||
# clusters refers to the remote service.
|
||||
clusters:
|
||||
- name: name-of-remote-authz-service
|
||||
cluster:
|
||||
certificate-authority: /path/to/ca.pem # CA for verifying the remote service.
|
||||
server: https://authz.example.com/authorize # URL of remote service to query. Must use 'https'.
|
||||
|
||||
# users refers to the API Server's webhook configuration.
|
||||
users:
|
||||
- name: name-of-api-server
|
||||
user:
|
||||
client-certificate: /path/to/cert.pem # cert for the webhook plugin to use
|
||||
client-key: /path/to/key.pem # key matching the cert
|
||||
```
|
||||
|
||||
### Request Payloads
|
||||
|
||||
When faced with an authorization decision, the API Server POSTs a JSON serialized api.authorization.v1beta1.SubjectAccessReview object describing the action. This object contains fields describing the user attempting to make the request, and either details about the resource being accessed or requests attributes.
|
||||
|
||||
Note that webhook API objects are subject to the same [versioning compatibility rules](/docs/api/) as other Kubernetes API objects. Implementers should be aware of loser compatibility promises for beta objects and check the "apiVersion" field of the request to ensure correct deserialization. Additionally, the API Server must enable the `authorization.k8s.io/v1beta1` API extensions group (`--runtime-config=authorization.k8s.io/v1beta1=true`).
|
||||
|
||||
An example request body:
|
||||
|
||||
```json
|
||||
{
|
||||
"apiVersion": "authorization.k8s.io/v1beta1",
|
||||
"kind": "SubjectAccessReview",
|
||||
"spec": {
|
||||
"resourceAttributes": {
|
||||
"namespace": "kittensandponies",
|
||||
"verb": "GET",
|
||||
"group": "*",
|
||||
"resource": "pods"
|
||||
},
|
||||
"user": "jane",
|
||||
"group": [
|
||||
"group1",
|
||||
"group2"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The remote service is expected to fill the SubjectAccessReviewStatus field of the request and respond to either allow or disallow access. The response body's "spec" field is ignored and may be omitted. A permissive response would return:
|
||||
|
||||
```json
|
||||
{
|
||||
"apiVersion": "authorization.k8s.io/v1beta1",
|
||||
"kind": "SubjectAccessReview",
|
||||
"status": {
|
||||
"allowed": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
To disallow access, the remote service would return:
|
||||
|
||||
```json
|
||||
{
|
||||
"apiVersion": "authorization.k8s.io/v1beta1",
|
||||
"kind": "SubjectAccessReview",
|
||||
"status": {
|
||||
"allowed": false,
|
||||
"reason": "user does not have read access to the namespace"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Access to non-resource paths are sent as:
|
||||
|
||||
```json
|
||||
{
|
||||
"apiVersion": "authorization.k8s.io/v1beta1",
|
||||
"kind": "SubjectAccessReview",
|
||||
"spec": {
|
||||
"nonResourceAttributes": {
|
||||
"path": "/debug",
|
||||
"verb": "GET"
|
||||
},
|
||||
"user": "jane",
|
||||
"group": [
|
||||
"group1",
|
||||
"group2"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Non-resource paths include: `/api`, `/apis`, `/metrics`, `/resetMetrics`, `/logs`, `/debug`, `/healthz`, `/swagger-ui/`, `/swaggerapi/`, `/ui`, and `/version.` Clients require access to `/api`, `/api/*/`, `/apis/`, `/apis/*`, `/apis/*/*`, and `/version` to discover what resources and versions are present on the server. Access to other non-resource paths can be disallowed without restricting access to the REST api.
|
||||
|
||||
For further documentation refer to the authorization.v1beta1 API objects and plugin/pkg/auth/authorizer/webhook/webhook.go.
|
||||
|
||||
## Plugin Development
|
||||
|
||||
Other implementations can be developed fairly easily.
|
||||
|
|
|
@ -138,5 +138,7 @@ network rules on the host and performing connection forwarding.
|
|||
`supervisord` is a lightweight process babysitting system for keeping kubelet and docker
|
||||
running.
|
||||
|
||||
### fluentd
|
||||
|
||||
`fluentd` is a daemon which helps provide [cluster-level logging](#cluster-level-logging).
|
||||
|
||||
|
|
|
@ -4,16 +4,21 @@
|
|||
|
||||
## Support
|
||||
|
||||
At v1.0, Kubernetes supports clusters up to 100 nodes with 30 pods per node and 1-2 containers per pod.
|
||||
At {{page.version}}, Kubernetes supports clusters with up to 1000 nodes. More specifically, we support configurations that meet *all* of the following criteria:
|
||||
|
||||
* No more than 1000 nodes
|
||||
* No more than 30000 total pods
|
||||
* No more than 60000 total containers
|
||||
* No more than 100 pods per node
|
||||
|
||||
* TOC
|
||||
{:toc}
|
||||
{:toc}
|
||||
|
||||
## Setup
|
||||
|
||||
A cluster is a set of nodes (physical or virtual machines) running Kubernetes agents, managed by a "master" (the cluster-level control plane).
|
||||
|
||||
Normally the number of nodes in a cluster is controlled by the the value `NUM_MINIONS` in the platform-specific `config-default.sh` file (for example, see [GCE's `config-default.sh`](http://releases.k8s.io/{{page.githubbranch}}/cluster/gce/config-default.sh)).
|
||||
Normally the number of nodes in a cluster is controlled by the the value `NUM_NODES` in the platform-specific `config-default.sh` file (for example, see [GCE's `config-default.sh`](http://releases.k8s.io/{{page.githubbranch}}/cluster/gce/config-default.sh)).
|
||||
|
||||
Simply changing that value to something very large, however, may cause the setup script to fail for many cloud providers. A GCE deployment, for example, will run in to quota issues and fail to bring the cluster up.
|
||||
|
||||
|
@ -22,7 +27,7 @@ When setting up a large Kubernetes cluster, the following issues must be conside
|
|||
### Quota Issues
|
||||
|
||||
To avoid running into cloud provider quota issues, when creating a cluster with many nodes, consider:
|
||||
|
||||
|
||||
* Increase the quota for things like CPU, IPs, etc.
|
||||
* In [GCE, for example,](https://cloud.google.com/compute/docs/resource-quotas) you'll want to increase the quota for:
|
||||
* CPUs
|
||||
|
@ -35,28 +40,36 @@ To avoid running into cloud provider quota issues, when creating a cluster with
|
|||
* Target pools
|
||||
* Gating the setup script so that it brings up new node VMs in smaller batches with waits in between, because some cloud providers rate limit the creation of VMs.
|
||||
|
||||
### Etcd storage
|
||||
|
||||
To improve performance of large clusters, we store events in a separate dedicated etcd instance.
|
||||
|
||||
When creating a cluster, existing salt scripts:
|
||||
|
||||
* start and configure additional etcd instance
|
||||
* configure api-server to use it for storing events
|
||||
|
||||
### Addon Resources
|
||||
|
||||
To prevent memory leaks or other resource issues in [cluster addons](https://releases.k8s.io/{{page.githubbranch}}/cluster/addons) from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](http://pr.k8s.io/10653/files) and [#10778](http://pr.k8s.io/10778/files)).
|
||||
|
||||
For example:
|
||||
For [example](https://github.com/kubernetes/kubernetes/tree/{{page.githubbranch}}/cluster/saltbase/salt/fluentd-gcp/fluentd-gcp.yaml):
|
||||
|
||||
```yaml
|
||||
containers:
|
||||
- image: gcr.io/google_containers/heapster:v0.15.0
|
||||
name: heapster
|
||||
containers:
|
||||
- name: fluentd-cloud-logging
|
||||
image: gcr.io/google_containers/fluentd-gcp:1.16
|
||||
resources:
|
||||
limits:
|
||||
cpu: 100m
|
||||
memory: 200Mi
|
||||
```
|
||||
|
||||
These limits, however, are based on data collected from addons running on 4-node clusters (see [#10335](http://issue.k8s.io/10335#issuecomment-117861225)). The addons consume a lot more resources when running on large deployment clusters (see [#5880](http://issue.k8s.io/5880#issuecomment-113984085)). So, if a large cluster is deployed without adjusting these values, the addons may continuously get killed because they keep hitting the limits.
|
||||
Except for Heapster, these limits are static and are based on data we collected from addons running on 4-node clusters (see [#10335](http://issue.k8s.io/10335#issuecomment-117861225)). The addons consume a lot more resources when running on large deployment clusters (see [#5880](http://issue.k8s.io/5880#issuecomment-113984085)). So, if a large cluster is deployed without adjusting these values, the addons may continuously get killed because they keep hitting the limits.
|
||||
|
||||
To avoid running into cluster addon resource issues, when creating a cluster with many nodes, consider the following:
|
||||
|
||||
- Scale memory and CPU limits for each of the following addons, if used, along with the size of cluster (there is one replica of each handling the entire cluster so memory and CPU usage tends to grow proportionally with size/load on cluster):
|
||||
- Heapster ([GCM/GCL backed](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/cluster-monitoring/google/heapster-controller.yaml), [InfluxDB backed](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/cluster-monitoring/influxdb/heapster-controller.yaml), [InfluxDB/GCL backed](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/cluster-monitoring/googleinfluxdb/heapster-controller-combined.yaml), [standalone](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/cluster-monitoring/standalone/heapster-controller.yaml))
|
||||
|
||||
* Scale memory and CPU limits for each of the following addons, if used, as you scale up the size of cluster (there is one replica of each handling the entire cluster so memory and CPU usage tends to grow proportionally with size/load on cluster):
|
||||
* [InfluxDB and Grafana](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/cluster-monitoring/influxdb/influxdb-grafana-controller.yaml)
|
||||
* [skydns, kube2sky, and dns etcd](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/dns/skydns-rc.yaml.in)
|
||||
* [Kibana](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/fluentd-elasticsearch/kibana-controller.yaml)
|
||||
|
@ -66,4 +79,20 @@ To avoid running into cluster addon resource issues, when creating a cluster wit
|
|||
* [FluentD with ElasticSearch Plugin](http://releases.k8s.io/{{page.githubbranch}}/cluster/saltbase/salt/fluentd-es/fluentd-es.yaml)
|
||||
* [FluentD with GCP Plugin](http://releases.k8s.io/{{page.githubbranch}}/cluster/saltbase/salt/fluentd-gcp/fluentd-gcp.yaml)
|
||||
|
||||
For directions on how to detect if addon containers are hitting resource limits, see the [Troubleshooting section of Compute Resources](/docs/user-guide/compute-resources/#troubleshooting).
|
||||
Heapster's resource limits are set dynamically based on the initial size of your cluster (see [#16185](http://issue.k8s.io/16185) and [#21258](http://issue.k8s.io/21258)). If you find that Heapster is running
|
||||
out of resources, you should adjust the formulas that compute heapster memory request (see those PRs for details).
|
||||
|
||||
For directions on how to detect if addon containers are hitting resource limits, see the [Troubleshooting section of Compute Resources](/docs/user-guide/compute-resources/#troubleshooting).
|
||||
|
||||
In the [future](http://issue.k8s.io/13048), we anticipate to set all cluster addon resource limits based on cluster size, and to dynamically adjust them if you grow or shrink your cluster.
|
||||
We welcome PRs that implement those features.
|
||||
|
||||
### Allowing minor node failure at startup
|
||||
|
||||
For various reasons (see [#18969](https://github.com/kubernetes/kubernetes/issues/18969) for more details) running
|
||||
`kube-up.sh` with a very large `NUM_NODES` may fail due to a very small number of nodes not coming up properly.
|
||||
Currently you have two choices: restart the cluster (`kube-down.sh` and then `kube-up.sh` again), or before
|
||||
running `kube-up.sh` set the environment variable `ALLOWED_NOTREADY_NODES` to whatever value you feel comfortable
|
||||
with. This will allow `kube-up.sh` to succeed with fewer than `NUM_NODES` coming up. Depending on the
|
||||
reason for the failure, those additional nodes may join later or the cluster may remain at a size of
|
||||
`NUM_NODES - ALLOWED_NOTREADY_NODES`.
|
||||
|
|
|
@ -14,31 +14,15 @@ To install Kubernetes on a set of machines, consult one of the existing [Getting
|
|||
|
||||
The current state of cluster upgrades is provider dependent.
|
||||
|
||||
### Master Upgrades
|
||||
### Upgrading Google Compute Engine clusters
|
||||
|
||||
Both Google Container Engine (GKE) and
|
||||
Compute Engine Open Source (GCE-OSS) support node upgrades via a [Managed Instance Group](https://cloud.google.com/compute/docs/instance-groups/).
|
||||
Managed Instance Group upgrades sequentially delete and recreate each virtual machine, while maintaining the same
|
||||
Persistent Disk (PD) to ensure that data is retained across the upgrade.
|
||||
Google Compute Engine Open Source (GCE-OSS) support master upgrades by deleting and
|
||||
recreating the master, while maintaining the same Persistent Disk (PD) to ensure that data is retained across the
|
||||
upgrade.
|
||||
|
||||
In contrast, the `kube-push.sh` process used on [other platforms](#other-platforms) attempts to upgrade the binaries in
|
||||
places, without recreating the virtual machines.
|
||||
|
||||
### Node Upgrades
|
||||
|
||||
Node upgrades for GKE and GCE-OSS again use a Managed Instance Group, each node is sequentially destroyed and then recreated with new software. Any Pods that are running
|
||||
on that node need to be controlled by a Replication Controller, or manually re-created after the roll out.
|
||||
|
||||
For other platforms, `kube-push.sh` is again used, performing an in-place binary upgrade on existing machines.
|
||||
|
||||
### Upgrading Google Container Engine (GKE)
|
||||
|
||||
Google Container Engine automatically updates master components (e.g. `kube-apiserver`, `kube-scheduler`) to the latest
|
||||
version. It also handles upgrading the operating system and other components that the master runs on.
|
||||
|
||||
The node upgrade process is user-initiated and is described in the [GKE documentation.](https://cloud.google.com/container-engine/docs/clusters/upgrade)
|
||||
|
||||
### Upgrading open source Google Compute Engine clusters
|
||||
Node upgrades for GCE use a [Managed Instance Group](https://cloud.google.com/compute/docs/instance-groups/), each node
|
||||
is sequentially destroyed and then recreated with new software. Any Pods that are running on that node need to be
|
||||
controlled by a Replication Controller, or manually re-created after the roll out.
|
||||
|
||||
Upgrades on open source Google Compute Engine (GCE) clusters are controlled by the `cluster/gce/upgrade.sh` script.
|
||||
|
||||
|
@ -56,7 +40,14 @@ Alternatively, to upgrade your entire cluster to the latest stable release:
|
|||
cluster/gce/upgrade.sh release/stable
|
||||
```
|
||||
|
||||
### Other platforms
|
||||
### Upgrading Google Container Engine (GKE) clusters
|
||||
|
||||
Google Container Engine automatically updates master components (e.g. `kube-apiserver`, `kube-scheduler`) to the latest
|
||||
version. It also handles upgrading the operating system and other components that the master runs on.
|
||||
|
||||
The node upgrade process is user-initiated and is described in the [GKE documentation.](https://cloud.google.com/container-engine/docs/clusters/upgrade)
|
||||
|
||||
### Upgrading clusters on other platforms
|
||||
|
||||
The `cluster/kube-push.sh` script will do a rudimentary update. This process is still quite experimental, we
|
||||
recommend testing the upgrade on an experimental cluster before performing the update on a production cluster.
|
||||
|
@ -67,7 +58,7 @@ If your cluster runs short on resources you can easily add more machines to it i
|
|||
If you're using GCE or GKE it's done by resizing Instance Group managing your Nodes. It can be accomplished by modifying number of instances on `Compute > Compute Engine > Instance groups > your group > Edit group` [Google Cloud Console page](https://console.developers.google.com) or using gcloud CLI:
|
||||
|
||||
```shell
|
||||
gcloud compute instance-groups managed --zone compute-zone resize my-cluster-minon-group --new-size 42
|
||||
gcloud compute instance-groups managed resize kubernetes-minion-group --size 42 --zone $ZONE
|
||||
```
|
||||
|
||||
Instance Group will take care of putting appropriate image on new machines and start them, while Kubelet will register its Node with API server to make it available for scheduling. If you scale the instance group down, system will randomly choose Nodes to kill.
|
||||
|
@ -77,22 +68,21 @@ In other environments you may need to configure the machine yourself and tell th
|
|||
|
||||
### Horizontal auto-scaling of nodes (GCE)
|
||||
|
||||
If you are using GCE, you can configure your cluster so that the number of nodes will be automatically scaled based on their CPU and memory utilization.
|
||||
Before setting up the cluster by `kube-up.sh`, you can set `KUBE_ENABLE_NODE_AUTOSCALE`
|
||||
environment variable to `true`
|
||||
and export it.
|
||||
If you are using GCE, you can configure your cluster so that the number of nodes will be automatically scaled based on:
|
||||
|
||||
* CPU and memory utilization.
|
||||
* Amount of of CPU and memory requested by the pods (called also reservation).
|
||||
|
||||
Before setting up the cluster by `kube-up.sh`, you can set `KUBE_ENABLE_NODE_AUTOSCALER` environment variable to `true` and export it.
|
||||
The script will create an autoscaler for the instance group managing your nodes.
|
||||
|
||||
The autoscaler will try to maintain the average CPU and memory utilization of nodes within the cluster close to the target value.
|
||||
The target value can be configured by `KUBE_TARGET_NODE_UTILIZATION`
|
||||
environment variable (default: 0.7) for `kube-up.sh` when creating the cluster.
|
||||
The node utilization is the total node's CPU/memory usage (OS + k8s + user load) divided by the node's capacity.
|
||||
If the desired numbers of nodes in the cluster resulting from CPU utilization and memory utilization are different,
|
||||
the autoscaler will choose the bigger number.
|
||||
The number of nodes in the cluster set by the autoscaler will be limited from `KUBE_AUTOSCALER_MIN_NODES`
|
||||
(default: 1)
|
||||
to `KUBE_AUTOSCALER_MAX_NODES`
|
||||
(default: the initial number of nodes in the cluster).
|
||||
The autoscaler will try to maintain the average CPU/memory utilization and reservation of nodes within the cluster close to the target value.
|
||||
The target value can be configured by `KUBE_TARGET_NODE_UTILIZATION` environment variable (default: 0.7) for ``kube-up.sh`` when creating the cluster.
|
||||
Node utilization is the total node's CPU/memory usage (OS + k8s + user load) divided by the node's capacity.
|
||||
Node reservation is the total CPU/memory requested by pods that are running on the node divided by the node's capacity.
|
||||
If the desired numbers of nodes in the cluster resulting from CPU/memory utilization/reservation are different,
|
||||
the autoscaler will choose the bigger number. The number of nodes in the cluster set by the autoscaler will be limited from `KUBE_AUTOSCALER_MIN_NODES` (default: 1)
|
||||
to `KUBE_AUTOSCALER_MAX_NODES` (default: the initial number of nodes in the cluster).
|
||||
|
||||
The autoscaler is implemented as a Compute Engine Autoscaler.
|
||||
The initial values of the autoscaler parameters set by `kube-up.sh` and some more advanced options can be tweaked on
|
||||
|
@ -100,10 +90,13 @@ The initial values of the autoscaler parameters set by `kube-up.sh` and some mor
|
|||
or using gcloud CLI:
|
||||
|
||||
```shell
|
||||
gcloud preview autoscaler --zone compute-zone <command>
|
||||
gcloud alpha compute autoscaler --zone $ZONE <command>
|
||||
```
|
||||
|
||||
Note that autoscaling will work properly only if node metrics are accessible in Google Cloud Monitoring. To make the metrics accessible, you need to create your cluster with `KUBE_ENABLE_CLUSTER_MONITORING` equal to `google` or `googleinfluxdb` (`googleinfluxdb` is the default value).
|
||||
Note that autoscaling will work properly only if node metrics are accessible in Google Cloud Monitoring.
|
||||
To make the metrics accessible, you need to create your cluster with `KUBE_ENABLE_CLUSTER_MONITORING`
|
||||
equal to `google` or `googleinfluxdb` (`googleinfluxdb` is the default value). Please also make sure
|
||||
that you have Google Cloud Monitoring API enabled in Google Developer Console.
|
||||
|
||||
## Maintenance on a Node
|
||||
|
||||
|
@ -179,9 +172,10 @@ for changes to this variable to take effect.
|
|||
|
||||
### Switching your config files to a new API version
|
||||
|
||||
You can use the `kube-version-change` utility to convert config files between different API versions.
|
||||
You can use `kubectl convert` command to convert config files between different API versions.
|
||||
|
||||
```shell
|
||||
$ hack/build-go.sh cmd/kube-version-change
|
||||
$ _output/local/go/bin/kube-version-change -i myPod.v1beta3.yaml -o myPod.v1.yaml
|
||||
$ kubectl convert -f pod.yaml --output-version v1
|
||||
```
|
||||
|
||||
For more options, please refer to the usage of [kubectl convert](/docs/user-guide/kubectl/kubectl_convert/) command.
|
||||
|
|
|
@ -26,8 +26,8 @@ but with different flags and/or different memory and cpu requests for different
|
|||
### Required Fields
|
||||
|
||||
As with all other Kubernetes config, a DaemonSet needs `apiVersion`, `kind`, and `metadata` fields. For
|
||||
general information about working with config files, see [here](/docs/user-guide/simple-yaml),
|
||||
[here](/docs/user-guide/configuring-containers), and [here](/docs/user-guide/working-with-resources).
|
||||
general information about working with config files, see [deploying applications](/docs/user-guide/deploying-applications/),
|
||||
[configuring containers](/docs/user-guide/configuring-containers/), and [working with resources](/docs/user-guide/working-with-resources/) documents.
|
||||
|
||||
A DaemonSet also needs a [`.spec`](https://github.com/kubernetes/kubernetes/tree/{{page.githubbranch}}/docs/devel/api-conventions.md#spec-and-status) section.
|
||||
|
||||
|
@ -48,11 +48,18 @@ A pod template in a DaemonSet must have a [`RestartPolicy`](/docs/user-guide/pod
|
|||
### Pod Selector
|
||||
|
||||
The `.spec.selector` field is a pod selector. It works the same as the `.spec.selector` of
|
||||
a [ReplicationController](/docs/user-guide/replication-controller) or
|
||||
[Job](/docs/user-guide/jobs).
|
||||
a [Job](/docs/user-guide/jobs/) or other new resources.
|
||||
|
||||
If the `.spec.selector` is specified, it must equal the `.spec.template.metadata.labels`. If not
|
||||
specified, the are default to be equal. Config with these unequal will be rejected by the API.
|
||||
The `spec.selector` is an object consisting of two fields:
|
||||
|
||||
* `matchLabels` - works the same as the `.spec.selector` of a [ReplicationController](/docs/user-guide/replication-controller/)
|
||||
* `matchExpressions` - allows to build more sophisticated selectors by specifying key,
|
||||
list of values and an operator that relates the key and values.
|
||||
|
||||
When the two are specified the result is ANDed.
|
||||
|
||||
If the `.spec.selector` is specified, it must match the `.spec.template.metadata.labels`. If not
|
||||
specified, they are defaulted to be equal. Config with these not matching will be rejected by the API.
|
||||
|
||||
Also you should not normally create any pods whose labels match this selector, either directly, via
|
||||
another DaemonSet, or via other controller such as ReplicationController. Otherwise, the DaemonSet
|
||||
|
@ -100,11 +107,11 @@ If node labels are changed, the DaemonSet will promptly add pods to newly matchi
|
|||
pods from newly not-matching nodes.
|
||||
|
||||
You can modify the pods that a DaemonSet creates. However, pods do not allow all
|
||||
fields to be updated. Also, the DeamonSet controller will use the original template the next
|
||||
fields to be updated. Also, the DaemonSet controller will use the original template the next
|
||||
time a node (even with the same name) is created.
|
||||
|
||||
|
||||
You can delete a DeamonSet. If you specify `--cascade=false` with `kubectl`, then the pods
|
||||
You can delete a DaemonSet. If you specify `--cascade=false` with `kubectl`, then the pods
|
||||
will be left on the nodes. You can then create a new DaemonSet with a different template.
|
||||
the new DaemonSet with the different template will recognize all the existing pods as having
|
||||
matching labels. It will not modify or delete them despite a mismatch in the pod template.
|
||||
|
@ -137,6 +144,14 @@ a Daemon Set replaces pods that are deleted or terminated for any reason, such a
|
|||
node failure or disruptive node maintenance, such as a kernel upgrade. For this reason, you should
|
||||
use a Daemon Set rather than creating individual pods.
|
||||
|
||||
### Static Pods
|
||||
|
||||
It is possible to create pods by writing a file to a certain directory watched by Kubelet. These
|
||||
are called [static pods](/docs/admin/static-pods/).
|
||||
Unlike DaemonSet, static pods cannot be managed with kubectl
|
||||
or other Kubernetes API clients. Static pods do not depend on the apiserver, making them useful
|
||||
in cluster bootstrapping cases. Also, static pods may be deprecated in the future.
|
||||
|
||||
### Replication Controller
|
||||
|
||||
Daemon Set are similar to [Replication Controllers](/docs/user-guide/replication-controller) in that
|
||||
|
@ -147,15 +162,3 @@ Use a replication controller for stateless services, like frontends, where scali
|
|||
number of replicas and rolling out updates are more important than controlling exactly which host
|
||||
the pod runs on. Use a Daemon Controller when it is important that a copy of a pod always run on
|
||||
all or certain hosts, and when it needs to start before other pods.
|
||||
|
||||
## Caveats
|
||||
|
||||
DaemonSet objects are in the [`extensions` API Group](/docs/api/#api-groups).
|
||||
DaemonSet is not enabled by default. Enable it by setting
|
||||
`--runtime-config=extensions/v1beta1/daemonsets=true` on the api server. This can be
|
||||
achieved by exporting ENABLE_DAEMONSETS=true before running kube-up.sh script
|
||||
on GCE.
|
||||
|
||||
DaemonSet objects effectively have [API version `v1alpha1`](/docs/api/)#api-versioning).
|
||||
Alpha objects may change or even be discontinued in future software releases.
|
||||
However, due to to a known issue, they will appear as API version `v1beta1` if enabled.
|
|
@ -20,8 +20,8 @@ supports forward lookups (A records) and service lookups (SRV records).
|
|||
|
||||
## How it Works
|
||||
|
||||
The running DNS pod holds 3 containers - skydns, etcd (a private instance which skydns uses),
|
||||
and a Kubernetes-to-skydns bridge called kube2sky. The kube2sky process
|
||||
The running DNS pod holds 4 containers - skydns, etcd (a private instance which skydns uses),
|
||||
a Kubernetes-to-skydns bridge called kube2sky, and a health check called healthz. The kube2sky process
|
||||
watches the Kubernetes master for changes in Services, and then writes the
|
||||
information to etcd, which skydns reads. This etcd instance is not linked to
|
||||
any other etcd clusters that might exist, including the Kubernetes master.
|
||||
|
|
|
@ -1,7 +1,8 @@
|
|||
---
|
||||
---
|
||||
|
||||
[etcd](https://coreos.com/etcd/docs/2.0.12/) is a highly-available key value
|
||||
|
||||
[etcd](https://coreos.com/etcd/docs/2.2.1/) is a highly-available key value
|
||||
store which Kubernetes uses for persistent storage of all of its REST API
|
||||
objects.
|
||||
|
||||
|
@ -17,12 +18,12 @@ Data Reliability: for reasonable safety, either etcd needs to be run as a
|
|||
etcd) or etcd's data directory should be located on durable storage (e.g., GCE's
|
||||
persistent disk). In either case, if high availability is required--as it might
|
||||
be in a production cluster--the data directory ought to be [backed up
|
||||
periodically](https://coreos.com/etcd/docs/2.0.12/admin_guide/#disaster-recovery),
|
||||
periodically](https://coreos.com/etcd/docs/2.2.1/admin_guide.html#disaster-recovery),
|
||||
to reduce downtime in case of corruption.
|
||||
|
||||
## Default configuration
|
||||
|
||||
The default setup scripts run etcd in a
|
||||
The default setup scripts use kubelet's file-based static pods feature to run etcd in a
|
||||
[pod](http://releases.k8s.io/{{page.githubbranch}}/cluster/saltbase/salt/etcd/etcd.manifest). This manifest should only
|
||||
be run on master VMs. The default location that kubelet scans for manifests is
|
||||
`/etc/kubernetes/manifests/`.
|
||||
|
|
|
@ -1,52 +1,41 @@
|
|||
---
|
||||
---
|
||||
* TOC
|
||||
{:toc}
|
||||
|
||||
Garbage collection is managed by kubelet automatically, mainly including unreferenced
|
||||
images and dead containers. kubelet applies container garbage collection every minute
|
||||
and image garbage collection every 5 minutes.
|
||||
Note that we don't recommend external garbage collection tool generally, since it could
|
||||
break the behavior of kubelet potentially if it attempts to remove all of the containers
|
||||
which acts as the tombstone kubelet relies on. Yet those garbage collector aims to deal
|
||||
with the docker leaking issues would be appreciated.
|
||||
### Introduction
|
||||
|
||||
Garbage collection is a helpful function of kubelet that will clean up unreferenced images and unused containers. kubelet will perform garbage collection for containers every minute and garbage collection for images every five minutes.
|
||||
|
||||
External garbage collection tools are not recommended as these tools can potentially break the behavior of kubelet by removing containers expected to exist.
|
||||
|
||||
### Image Collection
|
||||
|
||||
kubernetes manages lifecycle of all images through imageManager, with the cooperation
|
||||
of cadvisor.
|
||||
The policy for garbage collecting images we apply takes two factors into consideration,
|
||||
|
||||
The policy for garbage collecting images takes two factors into consideration:
|
||||
`HighThresholdPercent` and `LowThresholdPercent`. Disk usage above the the high threshold
|
||||
will trigger garbage collection, which attempts to delete unused images until the low
|
||||
threshold is met. Least recently used images are deleted first.
|
||||
will trigger garbage collection. The garbage collection will delete least recently used images until the low
|
||||
threshold has been met.
|
||||
|
||||
### Container Collection
|
||||
|
||||
The policy for garbage collecting containers we apply takes on three variables, which can
|
||||
be user-defined. `MinAge` is the minimum age at which a container can be garbage collected,
|
||||
zero for no limit. `MaxPerPodContainer` is the max number of dead containers any single
|
||||
pod (UID, container name) pair is allowed to have, less than zero for no limit.
|
||||
`MaxContainers` is the max number of total dead containers, less than zero for no limit as well.
|
||||
The policy for garbage collecting containers considers three user-defined variables. `MinAge` is the minimum age at which a container can be garbage collected. `MaxPerPodContainer` is the maximum number of dead containers any single
|
||||
pod (UID, container name) pair is allowed to have. `MaxContainers` is the maximum number of total dead containers. These variables can be individually disabled by setting 'Min Age' to zero and setting 'MaxPerPodContainer' and 'MaxContainers' respectively to less than zero.
|
||||
|
||||
kubelet sorts out containers which are unidentified or stay out of bounds set by previous
|
||||
mentioned three flags. Gernerally the oldest containers are removed first. Since we take both
|
||||
`MaxPerPodContainer` and `MaxContainers` into consideration, it could happen when they
|
||||
have conflict -- retaining the max number of containers per pod goes out of range set by max
|
||||
number of global dead containers. In this case, we would sacrifice the `MaxPerPodContainer`
|
||||
a little bit. For the worst case, we first downgrade it to 1 container per pod, and then
|
||||
evict the oldest containers for the greater good.
|
||||
Kubelet will act on containers that are unidentified, deleted, or outside of the boundaries set by the previously mentioned flags. The oldest containers will generally be removed first. 'MaxPerPodContainer' and 'MaxContainer' may potentially conflict with each other in situations where retaining the maximum number of containers per pod ('MaxPerPodContainer') would go outside the allowable range of global dead containers ('MaxContainers'). 'MaxPerPodContainer' would be adjusted in this situation: A worst case scenario would be to downgrade 'MaxPerPodContainer' to 1 and evict the oldest containers. Additionally, containers owned by pods that have been deleted are removed once they are older than `MinAge`.
|
||||
|
||||
When kubelet removes the dead containers, all the files inside the container will be cleaned up as well.
|
||||
Note that we will skip the containers that are not managed by kubelet.
|
||||
Containers that are not managed by kubelet are not subject to container garbage collection.
|
||||
|
||||
### User Configuration
|
||||
|
||||
Users are free to set their own value to address image garbage collection.
|
||||
Users can adjust the following thresholds to tune image garbage collection with the following kubelet flags :
|
||||
|
||||
1. `image-gc-high-threshold`, the percent of disk usage which triggers image garbage collection.
|
||||
Default is 90%.
|
||||
2. `image-gc-low-threshold`, the percent of disk usage to which image garbage collection attempts
|
||||
to free. Default is 80%.
|
||||
|
||||
We also allow users to customize garbage collection policy, basically via following three flags.
|
||||
We also allow users to customize garbage collection policy through the following kubelet flags:
|
||||
|
||||
1. `minimum-container-ttl-duration`, minimum age for a finished container before it is
|
||||
garbage collected. Default is 1 minute.
|
||||
|
@ -55,7 +44,9 @@ per container. Default is 2.
|
|||
3. `maximum-dead-containers`, maximum number of old instances of containers to retain globally.
|
||||
Default is 100.
|
||||
|
||||
Note that we highly recommend a large enough value for `maximum-dead-containers-per-container`
|
||||
to allow at least 2 dead containers retaining per expected container when you customize the flag
|
||||
configuration. A loose value for `maximum-dead-containers` also assumes importance for a similar reason.
|
||||
Containers can potentially be garbage collected before their usefulness has expired. These containers
|
||||
can contain logs and other data that can be useful for troubleshooting. A sufficiently large value for
|
||||
`maximum-dead-containers-per-container` is highly recommended to allow at least 2 dead containers to be
|
||||
retained per expected container. A higher value for `maximum-dead-containers` is also recommended for a
|
||||
similiar reason.
|
||||
See [this issue](https://github.com/kubernetes/kubernetes/issues/13287) for more details.
|
||||
|
|
|
@ -1,6 +1,13 @@
|
|||
---
|
||||
---
|
||||
|
||||
|
||||
## Introduction
|
||||
|
||||
PLEASE NOTE: Note that the podmaster implementation is obsoleted by https://github.com/kubernetes/kubernetes/pull/16830,
|
||||
which provides a primitive for leader election in the experimental kubernetes API.
|
||||
|
||||
Nevertheless, the concepts and implementation in this document are still valid, as is the podmaster implementation itself.
|
||||
|
||||
This document describes how to build a high-availability (HA) Kubernetes cluster. This is a fairly advanced topic.
|
||||
Users who merely want to experiment with Kubernetes are encouraged to use configurations that are simpler to set up such as
|
||||
the simple [Docker based single node cluster instructions](/docs/getting-started-guides/docker),
|
||||
|
@ -197,7 +204,7 @@ touch /var/log/kube-controller-manager.log
|
|||
```
|
||||
|
||||
Next, set up the descriptions of the scheduler and controller manager pods on each node.
|
||||
by copying [kube-scheduler.yaml](/docs/admin/high-availability/kube-scheduler.yaml) and [kube-controller-manager.yaml](high-availability//{{page.version}}/docs/admin/kube-controller-manager.yaml) into the `/srv/kubernetes/` directory.
|
||||
by copying [kube-scheduler.yaml](/docs/admin/high-availability/kube-scheduler.yaml) and [kube-controller-manager.yaml](/docs/admin/high-availability/kube-controller-manager.yaml) into the `/srv/kubernetes/` directory.
|
||||
|
||||
### Running the podmaster
|
||||
|
||||
|
@ -218,10 +225,4 @@ If you have an existing cluster, this is as simple as reconfiguring your kubelet
|
|||
restarting the kubelets on each node.
|
||||
|
||||
If you are turning up a fresh cluster, you will need to install the kubelet and kube-proxy on each worker node, and
|
||||
set the `--apiserver` flag to your replicated endpoint.
|
||||
|
||||
## Vagrant up!
|
||||
|
||||
We indeed have an initial proof of concept tester for this, which is available [here](https://releases.k8s.io/{{page.githubbranch}}/examples/high-availability).
|
||||
|
||||
It implements the major concepts (with a few minor reductions for simplicity), of the podmaster HA implementation alongside a quick smoke test using k8petstore.
|
||||
set the `--apiserver` flag to your replicated endpoint.
|
||||
|
|
|
@ -26,7 +26,7 @@ This example demonstrates how limits can be applied to a Kubernetes namespace to
|
|||
min/max resource limits per pod. In addition, this example demonstrates how you can
|
||||
apply default resource limits to pods in the absence of an end-user specified value.
|
||||
|
||||
See [LimitRange design doc](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/design/admission_control_limit_range.md) for more information. For a detailed description of the Kubernetes resource model, see [Resources](/docs/user-guide/compute-resources)
|
||||
See [LimitRange design doc](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/design/admission_control_limit_range.md) for more information. For a detailed description of the Kubernetes resource model, see [Resources](/docs/user-guide/compute-resources/)
|
||||
|
||||
## Step 0: Prerequisites
|
||||
|
||||
|
@ -64,12 +64,12 @@ Let's describe the limits that we have imposed in our namespace.
|
|||
$ kubectl describe limits mylimits --namespace=limit-example
|
||||
Name: mylimits
|
||||
Namespace: limit-example
|
||||
Type Resource Min Max Request Limit Limit/Request
|
||||
---- -------- --- --- ------- ----- -------------
|
||||
Pod cpu 200m 2 - - -
|
||||
Pod memory 6Mi 1Gi - - -
|
||||
Container cpu 100m 2 200m 300m -
|
||||
Container memory 3Mi 1Gi 100Mi 200Mi -
|
||||
Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio
|
||||
---- -------- --- --- --------------- ------------- -----------------------
|
||||
Pod cpu 200m 2 - - -
|
||||
Pod memory 6Mi 1Gi - - -
|
||||
Container cpu 100m 2 200m 300m -
|
||||
Container memory 3Mi 1Gi 100Mi 200Mi -
|
||||
```
|
||||
|
||||
In this scenario, we have said the following:
|
||||
|
@ -108,7 +108,7 @@ $ kubectl get pods nginx-aq0mf --namespace=limit-example -o yaml | grep resource
|
|||
```
|
||||
|
||||
```yaml
|
||||
resourceVersion: "127"
|
||||
resourceVersion: "127"
|
||||
selfLink: /api/v1/namespaces/limit-example/pods/nginx-aq0mf
|
||||
uid: 51be42a7-7156-11e5-9921-286ed488f785
|
||||
spec:
|
||||
|
@ -145,7 +145,7 @@ $ kubectl get pods valid-pod --namespace=limit-example -o yaml | grep -C 6 resou
|
|||
```
|
||||
|
||||
```yaml
|
||||
uid: 162a12aa-7157-11e5-9921-286ed488f785
|
||||
uid: 162a12aa-7157-11e5-9921-286ed488f785
|
||||
spec:
|
||||
containers:
|
||||
- image: gcr.io/google_containers/serve_hostname
|
||||
|
|
|
@ -0,0 +1,81 @@
|
|||
---
|
||||
---
|
||||
|
||||
* TOC
|
||||
{:toc}
|
||||
|
||||
## Summary
|
||||
|
||||
This document catalogs the communication paths between the master (really the
|
||||
apiserver) and the Kubernetes cluster. The intent is to allow users to
|
||||
customize their installation to harden the network configuration such that
|
||||
the cluster can be run on an untrusted network (or on fully public IPs on a
|
||||
cloud provider).
|
||||
|
||||
## Cluster -> Master
|
||||
|
||||
All communication paths from the cluster to the master terminate at the
|
||||
apiserver (none of the other master components are designed to expose remote
|
||||
services). In a typical deployment, the apiserver is configured to listen for
|
||||
remote connections on a secure HTTPS port (443) with one or more forms of
|
||||
client [authentication](/docs/admin/authentication/) enabled.
|
||||
|
||||
Nodes should be provisioned with the public root certificate for the cluster
|
||||
such that they can connect securely to the apiserver along with valid client
|
||||
credentials. For example, on a default GCE deployment, the client credentials
|
||||
provided to the kubelet are in the form of a client certificate. Pods that
|
||||
wish to connect to the apiserver can do so securely by leveraging a service
|
||||
account so that Kubernetes will automatically inject the public root
|
||||
certificate and a valid bearer token into the pod when it is instantiated.
|
||||
The `kubernetes` service (in all namespaces) is configured with a virtual IP
|
||||
address that is redirected (via kube-proxy) to the HTTPS endpoint on the
|
||||
apiserver.
|
||||
|
||||
The master components communicate with the cluster apiserver over the
|
||||
insecure (not encrypted or authenticated) port. This port is typically only
|
||||
exposed on the localhost interface of the master machine, so that the master
|
||||
components, all running on the same machine, can communicate with the
|
||||
cluster apiserver. Over time, the master components will be migrated to use
|
||||
the secure port with authentication and authorization (see
|
||||
[#13598](https://github.com/kubernetes/kubernetes/issues/13598)).
|
||||
|
||||
As a result, the default operating mode for connections from the cluster
|
||||
(nodes and pods running on the nodes) to the master is secured by default
|
||||
and can run over untrusted and/or public networks.
|
||||
|
||||
## Master -> Cluster
|
||||
|
||||
There are two primary communication paths from the master (apiserver) to the
|
||||
cluster. The first is from the apiserver to the kubelet process which runs on
|
||||
each node in the cluster. The second is from the apiserver to any node, pod,
|
||||
or service through the apiserver's proxy functionality.
|
||||
|
||||
The connections from the apiserver to the kubelet are used for fetching logs
|
||||
for pods, attaching (through kubectl) to running pods, and using the kubelet's
|
||||
port-forwarding functionality. These connections terminate at the kubelet's
|
||||
HTTPS endpoint, which is typically using a self-signed certificate, and
|
||||
ignore the certificate presented by the kubelet (although you can override this
|
||||
behavior by specifying the `--kubelet-certificate-authority`,
|
||||
`--kubelet-client-certificate`, and `--kubelet-client-key` flags when starting
|
||||
the cluster apiserver). By default, these connections **are not currently safe**
|
||||
to run over untrusted and/or public networks as they are subject to
|
||||
man-in-the-middle attacks.
|
||||
|
||||
The connections from the apiserver to a node, pod, or service default to plain
|
||||
HTTP connections and are therefore neither authenticated nor encrypted. They
|
||||
can be run over a secure HTTPS connection by prefixing `https:` to the node,
|
||||
pod, or service name in the API URL, but they will not validate the certificate
|
||||
provided by the HTTPS endpoint nor provide client credentials so while the
|
||||
connection will by encrypted, it will not provide any guarantees of integrity.
|
||||
These connections **are not currently safe** to run over untrusted and/or
|
||||
public networks.
|
||||
|
||||
### SSH Tunnels
|
||||
|
||||
[Google Container Engine](https://cloud.google.com/container-engine/docs/) uses
|
||||
SSH tunnels to protect the Master -> Cluster communication paths. In this
|
||||
configuration, the apiserver initiates an SSH tunnel to each node in the
|
||||
cluster (connecting to the ssh server listening on port 22) and passes all
|
||||
traffic destined for a kubelet, node, pod, or service through the tunnel.
|
||||
This tunnel ensures that the traffic is not exposed outside of the private
|
||||
GCE network in which the cluster is running.
|
|
@ -13,8 +13,7 @@ we [plan to do this in the future](https://github.com/kubernetes/kubernetes/blob
|
|||
|
||||
On IaaS providers such as Google Compute Engine or Amazon Web Services, a VM exists in a
|
||||
[zone](https://cloud.google.com/compute/docs/zones) or [availability
|
||||
zone](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones).
|
||||
|
||||
zone](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html).
|
||||
We suggest that all the VMs in a Kubernetes cluster should be in the same availability zone, because:
|
||||
|
||||
- compared to having a single global Kubernetes cluster, there are fewer single-points of failure
|
||||
|
@ -52,8 +51,9 @@ Second, decide how many clusters should be able to be unavailable at the same ti
|
|||
the number that can be unavailable `U`. If you are not sure, then 1 is a fine choice.
|
||||
|
||||
If it is allowable for load-balancing to direct traffic to any region in the event of a cluster failure, then
|
||||
you need `R + U` clusters. If it is not (e.g you want to ensure low latency for all users in the event of a
|
||||
cluster failure), then you need to have `R * U` clusters (`U` in each of `R` regions). In any case, try to put each cluster in a different zone.
|
||||
you need at least the larger of `R` or `U + 1` clusters. If it is not (e.g you want to ensure low latency for all
|
||||
users in the event of a cluster failure), then you need to have `R * (U + 1)` clusters
|
||||
(`U + 1` in each of `R` regions). In any case, try to put each cluster in a different zone.
|
||||
|
||||
Finally, if any of your clusters would need more than the maximum recommended number of nodes for a Kubernetes cluster, then
|
||||
you may need even more clusters. Kubernetes v1.0 currently supports clusters up to 100 nodes in size, but we are targeting
|
||||
|
|
|
@ -49,7 +49,7 @@ One pattern this organization could follow is to partition the Kubernetes cluste
|
|||
|
||||
Let's create two new namespaces to hold our work.
|
||||
|
||||
Use the file [`namespace-dev.json`](/docs/admin/namespacesnamespace-dev.json) which describes a development namespace:
|
||||
Use the file [`namespace-dev.json`](/docs/admin/namespaces/namespace-dev.json) which describes a development namespace:
|
||||
|
||||
{% include code.html language="json" file="namespace-dev.json" ghlink="/docs/admin/namespaces/namespace-dev.json" %}
|
||||
|
||||
|
|
|
@ -0,0 +1,41 @@
|
|||
---
|
||||
---
|
||||
|
||||
__Disclaimer__: Network plugins are in alpha. Its contents will change rapidly.
|
||||
|
||||
Network plugins in Kubernetes come in a few flavors:
|
||||
|
||||
* Plain vanilla exec plugins - deprecated in favor of CNI plugins.
|
||||
* CNI plugins: adhere to the appc/CNI specification, designed for interoperability.
|
||||
* Kubenet plugin: implements basic `cbr0` using the `bridge` and `host-local` CNI plugins
|
||||
|
||||
## Installation
|
||||
|
||||
The kubelet has a single default network plugin, and a default network common to the entire cluster. It probes for plugins when it starts up, remembers what it found, and executes the selected plugin at appropriate times in the pod lifecycle (this is only true for docker, as rkt manages its own CNI plugins). There are two Kubelet command line parameters to keep in mind when using plugins:
|
||||
|
||||
* `network-plugin-dir`: Kubelet probes this directory for plugins on startup
|
||||
* `network-plugin`: The network plugin to use from `network-plugin-dir`. It must match the name reported by a plugin probed from the plugin directory. For CNI plugins, this is simply "cni".
|
||||
|
||||
## Network Plugin Requirements
|
||||
|
||||
Besides providing the [`NetworkPlugin` interface](https://github.com/kubernetes/kubernetes/tree/{{page.version}}/pkg/kubelet/network/plugins.go) to configure and clean up pod networking, the plugin may also need specific support for kube-proxy. The iptables proxy obviously depends on iptables, and the plugin may need to ensure that container traffic is made available to iptables. For example, if the plugin connects containers to a Linux bridge, the plugin must set the `net/bridge/bridge-nf-call-iptables` sysctl to `1` to ensure that the iptables proxy functions correctly. If the plugin does not use a Linux bridge (but instead something like Open vSwitch or some other mechanism) it should ensure container traffic is appropriately routed for the proxy.
|
||||
|
||||
By default if no kubelet network plugin is specified, the `noop` plugin is used, which sets `net/bridge/bridge-nf-call-iptables=1` to ensure simple configurations (like docker with a bridge) work correctly with the iptables proxy.
|
||||
|
||||
### Exec
|
||||
|
||||
Place plugins in `network-plugin-dir/plugin-name/plugin-name`, i.e if you have a bridge plugin and `network-plugin-dir` is `/usr/lib/kubernetes`, you'd place the bridge plugin executable at `/usr/lib/kubernetes/bridge/bridge`. See [this comment](https://github.com/kubernetes/kubernetes/tree/{{page.version}}/pkg/kubelet/network/exec/exec.go) for more details.
|
||||
|
||||
### CNI
|
||||
|
||||
The CNI plugin is selected by passing Kubelet the `--network-plugin=cni` command-line option. Kubelet reads the first CNI configuration file from `--network-plugin-dir` and uses the CNI configuration from that file to set up each pod's network. The CNI configuration file must match the [CNI specification](https://github.com/appc/cni/blob/master/SPEC.md), and any required CNI plugins referenced by the configuration must be present in `/opt/cni/bin`.
|
||||
|
||||
### kubenet
|
||||
|
||||
The Linux-only kubenet plugin provides functionality similar to the `--configure-cbr0` kubelet command-line option. It creates a Linux bridge named `cbr0` and creates a veth pair for each pod with the host end of each pair connected to `cbr0`. The pod end of the pair is assigned an IP address allocated from a range assigned to the node through either configuration or by the controller-manager. `cbr0` is assigned an MTU matching the smallest MTU of an enabled normal interface on the host. The kubenet plugin is currently mutually exclusive with, and will eventually replace, the --configure-cbr0 option. It is also currently incompatible with the flannel experimental overlay.
|
||||
|
||||
The plugin requires a few things:
|
||||
* The standard CNI `bridge` and `host-local` plugins to be placed in `/opt/cni/bin`.
|
||||
* Kubelet must be run with the `--network-plugin=kubenet` argument to enable the plugin
|
||||
* Kubelet must also be run with the `--reconcile-cidr` argument to ensure the IP subnet assigned to the node by configuration or the controller-manager is propagated to the plugin
|
||||
* The node must be assigned an IP subnet through either the `--pod-cidr` kubelet command-line option or the `--allocate-node-cidrs=true --cluster-cidr=<cidr>` controller-manager command-line options.
|
|
@ -79,7 +79,7 @@ talk to other VMs in your project. This is the same basic model.
|
|||
Until now this document has talked about containers. In reality, Kubernetes
|
||||
applies IP addresses at the `Pod` scope - containers within a `Pod` share their
|
||||
network namespaces - including their IP address. This means that containers
|
||||
within a `Pod` can all reach each other's ports on `localhost`. This does imply
|
||||
within a `Pod` can all reach each other’s ports on `localhost`. This does imply
|
||||
that containers within a `Pod` must coordinate port usage, but this is no
|
||||
different than processes in a VM. We call this the "IP-per-pod" model. This
|
||||
is implemented in Docker as a "pod container" which holds the network namespace
|
||||
|
@ -174,7 +174,7 @@ network, primarily aiming at Docker integration.
|
|||
|
||||
### Calico
|
||||
|
||||
[Calico](https://github.com/Metaswitch/calico) uses BGP to enable real container
|
||||
[Calico](https://github.com/projectcalico/calico-containers) uses BGP to enable real container
|
||||
IPs.
|
||||
|
||||
## Other reading
|
||||
|
|
|
@ -118,7 +118,7 @@ Node controller is a component in Kubernetes master which manages Node
|
|||
objects. It performs two major functions: cluster-wide node synchronization
|
||||
and single node life-cycle management.
|
||||
|
||||
Node controller has a sync loop that creates/deletes Nodes from Kubernetes
|
||||
Node controller has a sync loop that deletes Nodes from Kubernetes
|
||||
based on all matching VM instances listed from the cloud provider. The sync period
|
||||
can be controlled via flag `--node-sync-period`. If a new VM instance
|
||||
gets created, Node Controller creates a representation for it. If an existing
|
||||
|
@ -129,6 +129,12 @@ join a node to a Kubernetes cluster, you as an admin need to make sure proper se
|
|||
running in the node. In the future, we plan to automatically provision some node
|
||||
services.
|
||||
|
||||
In general, node controller is responsible for updating the NodeReady condition of node
|
||||
status to ConditionUnknown when a node becomes unreachable (e.g. due to the node being down),
|
||||
and then later evicting all the pods from the node (using graceful termination) if the node
|
||||
continues to be unreachable. (The current timeouts for those are 40s and 5m, respectively.)
|
||||
It also allocates CIDR blocks to the new nodes.
|
||||
|
||||
### Self-Registration of Nodes
|
||||
|
||||
When kubelet flag `--register-node` is true (the default), the kubelet will attempt to
|
||||
|
@ -163,7 +169,7 @@ preparatory step before a node reboot, etc. For example, to mark a node
|
|||
unschedulable, run this command:
|
||||
|
||||
```shell
|
||||
kubectl replace nodes 10.1.2.3 --patch='{"apiVersion": "v1", "unschedulable": true}'
|
||||
kubectl patch nodes $NODENAME -p '{"spec": {"unschedulable": true}}'
|
||||
```
|
||||
|
||||
Note that pods which are created by a daemonSet controller bypass the Kubernetes scheduler,
|
||||
|
@ -209,4 +215,4 @@ on each kubelet where you want to reserve resources.
|
|||
|
||||
Node is a top-level resource in the kubernetes REST API. More details about the
|
||||
API object can be found at: [Node API
|
||||
object](http://kubernetes.io/v1.1/docs/api-reference/v1/definitions/#_v1_node).
|
||||
object](/docs/api-reference/v1/definitions/#_v1_node).
|
||||
|
|
|
@ -95,7 +95,7 @@ $ cat <<EOF > quota.json
|
|||
"apiVersion": "v1",
|
||||
"kind": "ResourceQuota",
|
||||
"metadata": {
|
||||
"name": "quota",
|
||||
"name": "quota"
|
||||
},
|
||||
"spec": {
|
||||
"hard": {
|
||||
|
@ -104,8 +104,8 @@ $ cat <<EOF > quota.json
|
|||
"pods": "10",
|
||||
"services": "5",
|
||||
"replicationcontrollers":"20",
|
||||
"resourcequotas":"1",
|
||||
},
|
||||
"resourcequotas":"1"
|
||||
}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
|
|
|
@ -24,7 +24,7 @@ If you are running the Vagrant based environment, the **salt-api** service is ru
|
|||
|
||||
## Standalone Salt Configuration on GCE
|
||||
|
||||
On GCE, the master and nodes are all configured as [standalone minions](http://docs.saltstack.com/en/latest/topics/tutorials/standalone_minion). The configuration for each VM is derived from the VM's [instance metadata](https://cloud.google.com/compute/docs/metadata) and then stored in Salt grains (`/etc/salt/minion.d/grains.conf`) and pillars (`/srv/salt-overlay/pillar/cluster-params.sls`) that local Salt uses to enforce state.
|
||||
On GCE, the master and nodes are all configured as [standalone minions](http://docs.saltstack.com/en/latest/topics/tutorials/standalone_minion.html). The configuration for each VM is derived from the VM's [instance metadata](https://cloud.google.com/compute/docs/metadata) and then stored in Salt grains (`/etc/salt/minion.d/grains.conf`) and pillars (`/srv/salt-overlay/pillar/cluster-params.sls`) that local Salt uses to enforce state.
|
||||
|
||||
All remaining sections that refer to master/minion setups should be ignored for GCE. One fallout of the GCE setup is that the Salt mine doesn't exist - there is no sharing of configuration amongst nodes.
|
||||
|
||||
|
@ -50,7 +50,7 @@ An example file is presented below using the Vagrant based environment.
|
|||
[root@kubernetes-master] $ cat /etc/salt/minion.d/grains.conf
|
||||
grains:
|
||||
etcd_servers: $MASTER_IP
|
||||
cloud_provider: vagrant
|
||||
cloud: vagrant
|
||||
roles:
|
||||
- kubernetes-master
|
||||
```
|
||||
|
|
|
@ -15,7 +15,7 @@ for a number of reasons:
|
|||
- User accounts are for humans. Service accounts are for processes, which
|
||||
run in pods.
|
||||
- User accounts are intended to be global. Names must be unique across all
|
||||
namespaces of a cluster, future user resource will not be namespaced).
|
||||
namespaces of a cluster, future user resource will not be namespaced.
|
||||
Service accounts are namespaced.
|
||||
- Typically, a cluster's User accounts might be synced from a corporate
|
||||
database, where new user account creation requires special privileges and
|
||||
|
|
|
@ -0,0 +1,123 @@
|
|||
---
|
||||
---
|
||||
|
||||
**If you are running clustered Kubernetes and are using static pods to run a pod on every node, you should probably be using a [DaemonSet](/docs/admin/daemons/)!**
|
||||
|
||||
*Static pods* are managed directly by kubelet daemon on a specific node, without API server observing it. It does not have associated any replication controller, kubelet daemon itself watches it and restarts it when it crashes. There is no health check though. Static pods are always bound to one kubelet daemon and always run on the same node with it.
|
||||
|
||||
Kubelet automatically creates so-called *mirror pod* on Kubernetes API server for each static pod, so the pods are visible there, but they cannot be controlled from the API server.
|
||||
|
||||
## Static pod creation
|
||||
|
||||
Static pod can be created in two ways: either by using configuration file(s) or by HTTP.
|
||||
|
||||
### Configuration files
|
||||
|
||||
The configuration files are just standard pod definition in json or yaml format in specific directory. Use `kubelet --config=<the directory>` to start kubelet daemon, which periodically scans the directory and creates/deletes static pods as yaml/json files appear/disappear there.
|
||||
|
||||
For example, this is how to start a simple web server as a static pod:
|
||||
|
||||
1. Choose a node where we want to run the static pod. In this example, it's `my-node1`.
|
||||
|
||||
```shell
|
||||
[joe@host ~] $ ssh my-node1
|
||||
```
|
||||
|
||||
2. Choose a directory, say `/etc/kubelet.d` and place a web server pod definition there, e.g. `/etc/kubernetes.d/static-web.yaml`:
|
||||
|
||||
```shell
|
||||
[root@my-node1 ~] $ mkdir /etc/kubernetes.d/
|
||||
[root@my-node1 ~] $ cat <<EOF >/etc/kubernetes.d/static-web.yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: static-web
|
||||
labels:
|
||||
role: myrole
|
||||
spec:
|
||||
containers:
|
||||
- name: web
|
||||
image: nginx
|
||||
ports:
|
||||
- name: web
|
||||
containerPort: 80
|
||||
protocol: tcp
|
||||
EOF
|
||||
```
|
||||
|
||||
2. Configure your kubelet daemon on the node to use this directory by running it with `--config=/etc/kubelet.d/` argument. On Fedora Fedora 21 with Kubernetes 0.17 edit `/etc/kubernetes/kubelet` to include this line:
|
||||
|
||||
```conf
|
||||
KUBELET_ARGS="--cluster-dns=10.254.0.10 --cluster-domain=kube.local --config=/etc/kubelet.d/"
|
||||
```
|
||||
|
||||
Instructions for other distributions or Kubernetes installations may vary.
|
||||
|
||||
3. Restart kubelet. On Fedora 21, this is:
|
||||
|
||||
```shell
|
||||
[root@my-node1 ~] $ systemctl restart kubelet
|
||||
```
|
||||
|
||||
## Pods created via HTTP
|
||||
|
||||
Kubelet periodically downloads a file specified by `--manifest-url=<URL>` argument and interprets it as a json/yaml file with a pod definition. It works the same as `--config=<directory>`, i.e. it's reloaded every now and then and changes are applied to running static pods (see below).
|
||||
|
||||
## Behavior of static pods
|
||||
|
||||
When kubelet starts, it automatically starts all pods defined in directory specified in `--config=` or `--manifest-url=` arguments, i.e. our static-web. (It may take some time to pull nginx image, be patient…):
|
||||
|
||||
```shell
|
||||
[joe@my-node1 ~] $ docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
|
||||
f6d05272b57e nginx:latest "nginx" 8 minutes ago Up 8 minutes k8s_web.6f802af4_static-web-fk-node1_default_67e24ed9466ba55986d120c867395f3c_378e5f3c
|
||||
```
|
||||
|
||||
If we look at our Kubernetes API server (running on host `my-master`), we see that a new mirror-pod was created there too:
|
||||
|
||||
```shell
|
||||
[joe@host ~] $ ssh my-master
|
||||
[joe@my-master ~] $ kubectl get pods
|
||||
POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS CREATED MESSAGE
|
||||
static-web-my-node1 172.17.0.3 my-node1/192.168.100.71 role=myrole Running 11 minutes
|
||||
web nginx Running 11 minutes
|
||||
```
|
||||
|
||||
Labels from the static pod are propagated into the mirror-pod and can be used as usual for filtering.
|
||||
|
||||
Notice we cannot delete the pod with the API server (e.g. via [`kubectl`](/docs/user-guide/kubectl/kubectl/) command), kubelet simply won't remove it.
|
||||
|
||||
```shell
|
||||
[joe@my-master ~] $ kubectl delete pod static-web-my-node1
|
||||
pods/static-web-my-node1
|
||||
[joe@my-master ~] $ kubectl get pods
|
||||
POD IP CONTAINER(S) IMAGE(S) HOST ...
|
||||
static-web-my-node1 172.17.0.3 my-node1/192.168.100.71 ...
|
||||
```
|
||||
|
||||
Back to our `my-node1` host, we can try to stop the container manually and see, that kubelet automatically restarts it in a while:
|
||||
|
||||
```shell
|
||||
[joe@host ~] $ ssh my-node1
|
||||
[joe@my-node1 ~] $ docker stop f6d05272b57e
|
||||
[joe@my-node1 ~] $ sleep 20
|
||||
[joe@my-node1 ~] $ docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED ...
|
||||
5b920cbaf8b1 nginx:latest "nginx -g 'daemon of 2 seconds ago ...
|
||||
```
|
||||
|
||||
## Dynamic addition and removal of static pods
|
||||
|
||||
Running kubelet periodically scans the configured directory (`/etc/kubelet.d` in our example) for changes and adds/removes pods as files appear/disappear in this directory.
|
||||
|
||||
```shell
|
||||
[joe@my-node1 ~] $ mv /etc/kubernetes.d/static-web.yaml /tmp
|
||||
[joe@my-node1 ~] $ sleep 20
|
||||
[joe@my-node1 ~] $ docker ps
|
||||
// no nginx container is running
|
||||
[joe@my-node1 ~] $ mv /tmp/static-web.yaml /etc/kubernetes.d/
|
||||
[joe@my-node1 ~] $ sleep 20
|
||||
[joe@my-node1 ~] $ docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED ...
|
||||
e7a62e3427f1 nginx:latest "nginx -g 'daemon of 27 seconds ago
|
||||
```
|
|
@ -14,8 +14,9 @@ while read line || [[ -n ${line} ]]; do
|
|||
CLEARPATH="${TARGET}"
|
||||
K8SSOURCE='k8s/_'${TARGET}
|
||||
DESTINATION=${TARGET%/*}
|
||||
rm -rf ${CLEARPATH}
|
||||
mv -f ${K8SSOURCE} ${DESTINATION}
|
||||
rm -rf "${CLEARPATH}"
|
||||
mv -f "${K8SSOURCE}" "${DESTINATION}"
|
||||
find "${DESTINATION}" -name "*.md" -print0 | xargs -0 sed -i '' -e 's/.html)/)/g'
|
||||
fi
|
||||
done <_data/overrides.yml
|
||||
|
||||
|
@ -23,6 +24,7 @@ rm -rf _includes/v1.1
|
|||
mv -f k8s/_includes/v1.1 _includes/
|
||||
cd _includes/v1.1
|
||||
find . -name '*.html' -type f -exec sed -i '' '/<style>/,/<\/style>/d' {} \;
|
||||
find . -name '*.html' -print0 | xargs -0 sed -i '' -e 's/http:\/\/kubernetes.io\/v1.1//g'
|
||||
cd ..
|
||||
cd ..
|
||||
|
||||
|
|
Loading…
Reference in New Issue