2016-02-24 21:47:57 +00:00
|
|
|
---
|
|
|
|
---
|
2016-02-26 11:54:48 +00:00
|
|
|
|
2016-02-29 23:17:22 +00:00
|
|
|
Kubernetes [`Pods`](/docs/user-guide/pods) are mortal. They are born and they die, and they
|
|
|
|
are not resurrected. [`ReplicationControllers`](/docs/user-guide/replication-controller) in
|
2016-02-26 11:54:48 +00:00
|
|
|
particular create and destroy `Pods` dynamically (e.g. when scaling up or down
|
2016-02-29 23:17:22 +00:00
|
|
|
or when doing [rolling updates](/docs/user-guide/kubectl/kubectl_rolling-update)). While each `Pod` gets its own IP address, even
|
2016-02-26 11:54:48 +00:00
|
|
|
those IP addresses cannot be relied upon to be stable over time. This leads to
|
|
|
|
a problem: if some set of `Pods` (let's call them backends) provides
|
|
|
|
functionality to other `Pods` (let's call them frontends) inside the Kubernetes
|
|
|
|
cluster, how do those frontends find out and keep track of which backends are
|
|
|
|
in that set?
|
|
|
|
|
|
|
|
Enter `Services`.
|
|
|
|
|
|
|
|
A Kubernetes `Service` is an abstraction which defines a logical set of `Pods`
|
|
|
|
and a policy by which to access them - sometimes called a micro-service. The
|
|
|
|
set of `Pods` targeted by a `Service` is (usually) determined by a [`Label
|
2016-02-29 23:17:22 +00:00
|
|
|
Selector`](/docs/user-guide/labels/#label-selectors) (see below for why you might want a
|
2016-02-26 11:54:48 +00:00
|
|
|
`Service` without a selector).
|
|
|
|
|
|
|
|
As an example, consider an image-processing backend which is running with 3
|
|
|
|
replicas. Those replicas are fungible - frontends do not care which backend
|
|
|
|
they use. While the actual `Pods` that compose the backend set may change, the
|
|
|
|
frontend clients should not need to be aware of that or keep track of the list
|
|
|
|
of backends themselves. The `Service` abstraction enables this decoupling.
|
|
|
|
|
|
|
|
For Kubernetes-native applications, Kubernetes offers a simple `Endpoints` API
|
|
|
|
that is updated whenever the set of `Pods` in a `Service` changes. For
|
|
|
|
non-native applications, Kubernetes offers a virtual-IP-based bridge to Services
|
|
|
|
which redirects to the backend `Pods`.
|
|
|
|
|
|
|
|
* TOC
|
|
|
|
{:toc}
|
|
|
|
|
|
|
|
## Defining a service
|
|
|
|
|
|
|
|
A `Service` in Kubernetes is a REST object, similar to a `Pod`. Like all of the
|
|
|
|
REST objects, a `Service` definition can be POSTed to the apiserver to create a
|
|
|
|
new instance. For example, suppose you have a set of `Pods` that each expose
|
|
|
|
port 9376 and carry a label `"app=MyApp"`.
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"kind": "Service",
|
|
|
|
"apiVersion": "v1",
|
|
|
|
"metadata": {
|
|
|
|
"name": "my-service"
|
|
|
|
},
|
|
|
|
"spec": {
|
|
|
|
"selector": {
|
|
|
|
"app": "MyApp"
|
|
|
|
},
|
|
|
|
"ports": [
|
|
|
|
{
|
|
|
|
"protocol": "TCP",
|
|
|
|
"port": 80,
|
|
|
|
"targetPort": 9376
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
This specification will create a new `Service` object named "my-service" which
|
|
|
|
targets TCP port 9376 on any `Pod` with the `"app=MyApp"` label. This `Service`
|
|
|
|
will also be assigned an IP address (sometimes called the "cluster IP"), which
|
|
|
|
is used by the service proxies (see below). The `Service`'s selector will be
|
|
|
|
evaluated continuously and the results will be POSTed to an `Endpoints` object
|
|
|
|
also named "my-service".
|
|
|
|
|
|
|
|
Note that a `Service` can map an incoming port to any `targetPort`. By default
|
|
|
|
the `targetPort` will be set to the same value as the `port` field. Perhaps
|
|
|
|
more interesting is that `targetPort` can be a string, referring to the name of
|
|
|
|
a port in the backend `Pods`. The actual port number assigned to that name can
|
|
|
|
be different in each backend `Pod`. This offers a lot of flexibility for
|
|
|
|
deploying and evolving your `Services`. For example, you can change the port
|
|
|
|
number that pods expose in the next version of your backend software, without
|
|
|
|
breaking clients.
|
|
|
|
|
|
|
|
Kubernetes `Services` support `TCP` and `UDP` for protocols. The default
|
|
|
|
is `TCP`.
|
|
|
|
|
|
|
|
### Services without selectors
|
|
|
|
|
|
|
|
Services generally abstract access to Kubernetes `Pods`, but they can also
|
|
|
|
abstract other kinds of backends. For example:
|
|
|
|
|
|
|
|
* You want to have an external database cluster in production, but in test
|
|
|
|
you use your own databases.
|
|
|
|
* You want to point your service to a service in another
|
2016-02-29 23:17:22 +00:00
|
|
|
[`Namespace`](/docs/user-guide/namespaces) or on another cluster.
|
2016-02-26 11:54:48 +00:00
|
|
|
* You are migrating your workload to Kubernetes and some of your backends run
|
|
|
|
outside of Kubernetes.
|
|
|
|
|
|
|
|
In any of these scenarios you can define a service without a selector:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"kind": "Service",
|
|
|
|
"apiVersion": "v1",
|
|
|
|
"metadata": {
|
|
|
|
"name": "my-service"
|
|
|
|
},
|
|
|
|
"spec": {
|
|
|
|
"ports": [
|
|
|
|
{
|
|
|
|
"protocol": "TCP",
|
|
|
|
"port": 80,
|
|
|
|
"targetPort": 9376
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
2016-03-07 12:09:02 +00:00
|
|
|
Because this service has no selector, the corresponding `Endpoints` object will not be
|
2016-02-26 11:54:48 +00:00
|
|
|
created. You can manually map the service to your own specific endpoints:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"kind": "Endpoints",
|
|
|
|
"apiVersion": "v1",
|
|
|
|
"metadata": {
|
|
|
|
"name": "my-service"
|
|
|
|
},
|
|
|
|
"subsets": [
|
|
|
|
{
|
|
|
|
"addresses": [
|
2016-03-07 12:09:02 +00:00
|
|
|
{ "ip": "1.2.3.4" }
|
2016-02-26 11:54:48 +00:00
|
|
|
],
|
|
|
|
"ports": [
|
2016-03-07 12:09:02 +00:00
|
|
|
{ "port": 9376 }
|
2016-02-26 11:54:48 +00:00
|
|
|
]
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
NOTE: Endpoint IPs may not be loopback (127.0.0.0/8), link-local
|
2016-03-21 19:04:42 +00:00
|
|
|
(169.254.0.0/16), or link-local multicast (224.0.0.0/24).
|
2016-02-26 11:54:48 +00:00
|
|
|
|
|
|
|
Accessing a `Service` without a selector works the same as if it had selector.
|
2016-03-07 12:09:02 +00:00
|
|
|
The traffic will be routed to endpoints defined by the user (`1.2.3.4:9376` in
|
2016-02-26 11:54:48 +00:00
|
|
|
this example).
|
|
|
|
|
|
|
|
## Virtual IPs and service proxies
|
|
|
|
|
|
|
|
Every node in a Kubernetes cluster runs a `kube-proxy`. This application
|
2016-03-07 12:09:02 +00:00
|
|
|
is responsible for implementing a form of virtual IP for `Service`s. In
|
|
|
|
Kubernetes v1.0 the proxy was purely in userspace. In Kubernetes v1.1 an
|
|
|
|
iptables proxy was added, but was not the default operating mode. In
|
|
|
|
Kubernetes v1.2 we expect the iptables proxy to be the default.
|
|
|
|
|
|
|
|
As of Kubernetes v1.0, `Services` are a "layer 3" (TCP/UDP over IP) construct.
|
|
|
|
In Kubernetes v1.1 the `Ingress` API was added (beta) to represent "layer 7"
|
|
|
|
(HTTP) services.
|
|
|
|
|
|
|
|
### Proxy-mode: userspace
|
|
|
|
|
|
|
|
In this mode, kube-proxy watches the Kubernetes master for the addition and
|
|
|
|
removal of `Service` and `Endpoints` objects. For each `Service` it opens a
|
|
|
|
port (randomly chosen) on the local node. Any connections to this "proxy port"
|
|
|
|
will be proxied to one of the `Service`'s backend `Pods` (as reported in
|
|
|
|
`Endpoints`). Which backend `Pod` to use is decided based on the
|
2016-02-26 11:54:48 +00:00
|
|
|
`SessionAffinity` of the `Service`. Lastly, it installs iptables rules which
|
2016-03-07 12:09:02 +00:00
|
|
|
capture traffic to the `Service`'s `clusterIP` (which is virtual) and `Port`
|
|
|
|
and redirects that traffic to the proxy port which proxies the a backend `Pod`.
|
2016-02-26 11:54:48 +00:00
|
|
|
|
2016-03-07 12:09:02 +00:00
|
|
|
The net result is that any traffic bound for the `Service`'s IP:Port is proxied
|
|
|
|
to an appropriate backend without the clients knowing anything about Kubernetes
|
|
|
|
or `Services` or `Pods`.
|
2016-02-26 11:54:48 +00:00
|
|
|
|
|
|
|
By default, the choice of backend is round robin. Client-IP based session affinity
|
|
|
|
can be selected by setting `service.spec.sessionAffinity` to `"ClientIP"` (the
|
|
|
|
default is `"None"`).
|
|
|
|
|
2016-03-07 12:09:02 +00:00
|
|
|
![Services overview diagram for userspace proxy](/images/docs/services-userspace-overview.svg)
|
|
|
|
|
|
|
|
### Proxy-mode: iptables
|
|
|
|
|
|
|
|
In this mode, kube-proxy watches the Kubernetes master for the addition and
|
|
|
|
removal of `Service` and `Endpoints` objects. For each `Service` it installs
|
|
|
|
iptables rules which capture traffic to the `Service`'s `clusterIP` (which is
|
|
|
|
virtual) and `Port` and redirects that traffic to one of the `Service`'s
|
|
|
|
backend sets. For each `Endpoints` object it installs iptables rules which
|
|
|
|
select a backend `Pod`.
|
|
|
|
|
|
|
|
By default, the choice of backend is random. Client-IP based session affinity
|
|
|
|
can be selected by setting `service.spec.sessionAffinity` to `"ClientIP"` (the
|
|
|
|
default is `"None"`).
|
|
|
|
|
|
|
|
As with the userspace proxy, the net result is that any traffic bound for the
|
|
|
|
`Service`'s IP:Port is proxied to an appropriate backend without the clients
|
|
|
|
knowing anything about Kubernetes or `Services` or `Pods`. This should be
|
|
|
|
faster and more reliable than the userspace proxy.
|
|
|
|
|
2016-03-07 13:05:16 +00:00
|
|
|
![Services overview diagram for iptables proxy](/images/docs/services-iptables-overview.svg)
|
2016-02-26 11:54:48 +00:00
|
|
|
|
|
|
|
## Multi-Port Services
|
|
|
|
|
|
|
|
Many `Services` need to expose more than one port. For this case, Kubernetes
|
|
|
|
supports multiple port definitions on a `Service` object. When using multiple
|
|
|
|
ports you must give all of your ports names, so that endpoints can be
|
|
|
|
disambiguated. For example:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"kind": "Service",
|
|
|
|
"apiVersion": "v1",
|
|
|
|
"metadata": {
|
|
|
|
"name": "my-service"
|
|
|
|
},
|
|
|
|
"spec": {
|
|
|
|
"selector": {
|
|
|
|
"app": "MyApp"
|
|
|
|
},
|
|
|
|
"ports": [
|
|
|
|
{
|
|
|
|
"name": "http",
|
|
|
|
"protocol": "TCP",
|
|
|
|
"port": 80,
|
|
|
|
"targetPort": 9376
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"name": "https",
|
|
|
|
"protocol": "TCP",
|
|
|
|
"port": 443,
|
|
|
|
"targetPort": 9377
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
## Choosing your own IP address
|
|
|
|
|
|
|
|
You can specify your own cluster IP address as part of a `Service` creation
|
|
|
|
request. To do this, set the `spec.clusterIP` field. For example, if you
|
|
|
|
already have an existing DNS entry that you wish to replace, or legacy systems
|
|
|
|
that are configured for a specific IP address and difficult to re-configure.
|
|
|
|
The IP address that a user chooses must be a valid IP address and within the
|
|
|
|
`service-cluster-ip-range` CIDR range that is specified by flag to the API
|
|
|
|
server. If the IP address value is invalid, the apiserver returns a 422 HTTP
|
|
|
|
status code to indicate that the value is invalid.
|
|
|
|
|
|
|
|
### Why not use round-robin DNS?
|
|
|
|
|
|
|
|
A question that pops up every now and then is why we do all this stuff with
|
|
|
|
virtual IPs rather than just use standard round-robin DNS. There are a few
|
|
|
|
reasons:
|
|
|
|
|
|
|
|
* There is a long history of DNS libraries not respecting DNS TTLs and
|
|
|
|
caching the results of name lookups.
|
|
|
|
* Many apps do DNS lookups once and cache the results.
|
|
|
|
* Even if apps and libraries did proper re-resolution, the load of every
|
|
|
|
client re-resolving DNS over and over would be difficult to manage.
|
|
|
|
|
|
|
|
We try to discourage users from doing things that hurt themselves. That said,
|
|
|
|
if enough people ask for this, we may implement it as an alternative.
|
|
|
|
|
|
|
|
## Discovering services
|
|
|
|
|
|
|
|
Kubernetes supports 2 primary modes of finding a `Service` - environment
|
|
|
|
variables and DNS.
|
|
|
|
|
|
|
|
### Environment variables
|
|
|
|
|
|
|
|
When a `Pod` is run on a `Node`, the kubelet adds a set of environment variables
|
|
|
|
for each active `Service`. It supports both [Docker links
|
|
|
|
compatible](https://docs.docker.com/userguide/dockerlinks/) variables (see
|
|
|
|
[makeLinkVariables](http://releases.k8s.io/{{page.githubbranch}}/pkg/kubelet/envvars/envvars.go#L49))
|
|
|
|
and simpler `{SVCNAME}_SERVICE_HOST` and `{SVCNAME}_SERVICE_PORT` variables,
|
|
|
|
where the Service name is upper-cased and dashes are converted to underscores.
|
|
|
|
|
|
|
|
For example, the Service `"redis-master"` which exposes TCP port 6379 and has been
|
|
|
|
allocated cluster IP address 10.0.0.11 produces the following environment
|
|
|
|
variables:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
REDIS_MASTER_SERVICE_HOST=10.0.0.11
|
|
|
|
REDIS_MASTER_SERVICE_PORT=6379
|
|
|
|
REDIS_MASTER_PORT=tcp://10.0.0.11:6379
|
|
|
|
REDIS_MASTER_PORT_6379_TCP=tcp://10.0.0.11:6379
|
|
|
|
REDIS_MASTER_PORT_6379_TCP_PROTO=tcp
|
|
|
|
REDIS_MASTER_PORT_6379_TCP_PORT=6379
|
|
|
|
REDIS_MASTER_PORT_6379_TCP_ADDR=10.0.0.11
|
|
|
|
```
|
|
|
|
|
|
|
|
*This does imply an ordering requirement* - any `Service` that a `Pod` wants to
|
|
|
|
access must be created before the `Pod` itself, or else the environment
|
|
|
|
variables will not be populated. DNS does not have this restriction.
|
|
|
|
|
|
|
|
### DNS
|
|
|
|
|
|
|
|
An optional (though strongly recommended) [cluster
|
|
|
|
add-on](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/README.md) is a DNS server. The
|
|
|
|
DNS server watches the Kubernetes API for new `Services` and creates a set of
|
|
|
|
DNS records for each. If DNS has been enabled throughout the cluster then all
|
|
|
|
`Pods` should be able to do name resolution of `Services` automatically.
|
|
|
|
|
|
|
|
For example, if you have a `Service` called `"my-service"` in Kubernetes
|
|
|
|
`Namespace` `"my-ns"` a DNS record for `"my-service.my-ns"` is created. `Pods`
|
|
|
|
which exist in the `"my-ns"` `Namespace` should be able to find it by simply doing
|
|
|
|
a name lookup for `"my-service"`. `Pods` which exist in other `Namespaces` must
|
|
|
|
qualify the name as `"my-service.my-ns"`. The result of these name lookups is the
|
|
|
|
cluster IP.
|
|
|
|
|
|
|
|
Kubernetes also supports DNS SRV (service) records for named ports. If the
|
|
|
|
`"my-service.my-ns"` `Service` has a port named `"http"` with protocol `TCP`, you
|
|
|
|
can do a DNS SRV query for `"_http._tcp.my-service.my-ns"` to discover the port
|
|
|
|
number for `"http"`.
|
|
|
|
|
|
|
|
## Headless services
|
|
|
|
|
|
|
|
Sometimes you don't need or want load-balancing and a single service IP. In
|
|
|
|
this case, you can create "headless" services by specifying `"None"` for the
|
|
|
|
cluster IP (`spec.clusterIP`).
|
|
|
|
|
|
|
|
For such `Services`, a cluster IP is not allocated. DNS is configured to return
|
|
|
|
multiple A records (addresses) for the `Service` name, which point directly to
|
|
|
|
the `Pods` backing the `Service`. Additionally, the kube proxy does not handle
|
|
|
|
these services and there is no load balancing or proxying done by the platform
|
|
|
|
for them. The endpoints controller will still create `Endpoints` records in
|
|
|
|
the API.
|
|
|
|
|
|
|
|
This option allows developers to reduce coupling to the Kubernetes system, if
|
|
|
|
they desire, but leaves them freedom to do discovery in their own way.
|
|
|
|
Applications can still use a self-registration pattern and adapters for other
|
|
|
|
discovery systems could easily be built upon this API.
|
|
|
|
|
|
|
|
## Publishing services - service types
|
|
|
|
|
|
|
|
For some parts of your application (e.g. frontends) you may want to expose a
|
|
|
|
Service onto an external (outside of your cluster, maybe public internet) IP
|
|
|
|
address, other services should be visible only from inside of the cluster.
|
|
|
|
|
|
|
|
|
|
|
|
Kubernetes `ServiceTypes` allow you to specify what kind of service you want.
|
|
|
|
The default and base type is `ClusterIP`, which exposes a service to connection
|
|
|
|
from inside the cluster. `NodePort` and `LoadBalancer` are two types that expose
|
|
|
|
services to external traffic.
|
|
|
|
|
|
|
|
Valid values for the `ServiceType` field are:
|
|
|
|
|
|
|
|
* `ClusterIP`: use a cluster-internal IP only - this is the default and is
|
|
|
|
discussed above. Choosing this value means that you want this service to be
|
|
|
|
reachable only from inside of the cluster.
|
|
|
|
* `NodePort`: on top of having a cluster-internal IP, expose the service on a
|
|
|
|
port on each node of the cluster (the same port on each node). You'll be able
|
|
|
|
to contact the service on any `<NodeIP>:NodePort` address.
|
|
|
|
* `LoadBalancer`: on top of having a cluster-internal IP and exposing service
|
|
|
|
on a NodePort also, ask the cloud provider for a load balancer
|
|
|
|
which forwards to the `Service` exposed as a `<NodeIP>:NodePort`
|
|
|
|
for each Node.
|
|
|
|
|
|
|
|
### Type NodePort
|
|
|
|
|
|
|
|
If you set the `type` field to `"NodePort"`, the Kubernetes master will
|
|
|
|
allocate a port from a flag-configured range (default: 30000-32767), and each
|
|
|
|
Node will proxy that port (the same port number on every Node) into your `Service`.
|
|
|
|
That port will be reported in your `Service`'s `spec.ports[*].nodePort` field.
|
|
|
|
|
|
|
|
If you want a specific port number, you can specify a value in the `nodePort`
|
|
|
|
field, and the system will allocate you that port or else the API transaction
|
|
|
|
will fail (i.e. you need to take care about possible port collisions yourself).
|
|
|
|
The value you specify must be in the configured range for node ports.
|
|
|
|
|
|
|
|
This gives developers the freedom to set up their own load balancers, to
|
|
|
|
configure cloud environments that are not fully supported by Kubernetes, or
|
|
|
|
even to just expose one or more nodes' IPs directly.
|
|
|
|
|
|
|
|
Note that this Service will be visible as both `<NodeIP>:spec.ports[*].nodePort`
|
|
|
|
and `spec.clusterIp:spec.ports[*].port`.
|
|
|
|
|
|
|
|
### Type LoadBalancer
|
|
|
|
|
|
|
|
On cloud providers which support external load balancers, setting the `type`
|
|
|
|
field to `"LoadBalancer"` will provision a load balancer for your `Service`.
|
|
|
|
The actual creation of the load balancer happens asynchronously, and
|
|
|
|
information about the provisioned balancer will be published in the `Service`'s
|
|
|
|
`status.loadBalancer` field. For example:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"kind": "Service",
|
|
|
|
"apiVersion": "v1",
|
|
|
|
"metadata": {
|
|
|
|
"name": "my-service"
|
|
|
|
},
|
|
|
|
"spec": {
|
|
|
|
"selector": {
|
|
|
|
"app": "MyApp"
|
|
|
|
},
|
|
|
|
"ports": [
|
|
|
|
{
|
|
|
|
"protocol": "TCP",
|
|
|
|
"port": 80,
|
|
|
|
"targetPort": 9376,
|
|
|
|
"nodePort": 30061
|
|
|
|
}
|
|
|
|
],
|
|
|
|
"clusterIP": "10.0.171.239",
|
|
|
|
"loadBalancerIP": "78.11.24.19",
|
|
|
|
"type": "LoadBalancer"
|
|
|
|
},
|
|
|
|
"status": {
|
|
|
|
"loadBalancer": {
|
|
|
|
"ingress": [
|
|
|
|
{
|
|
|
|
"ip": "146.148.47.155"
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
Traffic from the external load balancer will be directed at the backend `Pods`,
|
|
|
|
though exactly how that works depends on the cloud provider. Some cloud providers allow
|
|
|
|
the `loadBalancerIP` to be specified. In those cases, the load-balancer will be created
|
|
|
|
with the user-specified `loadBalancerIP`. If the `loadBalancerIP` field is not specified,
|
|
|
|
an ephemeral IP will be assigned to the loadBalancer. If the `loadBalancerIP` is specified, but the
|
|
|
|
cloud provider does not support the feature, the field will be ignored.
|
|
|
|
|
|
|
|
### External IPs
|
|
|
|
|
|
|
|
If there are external IPs that route to one or more cluster nodes, Kubernetes services can be exposed on those
|
|
|
|
`externalIPs`. Traffic that ingresses into the cluster with the external IP (as destination IP), on the service port,
|
|
|
|
will be routed to one of the service endpoints. `externalIPs` are not managed by Kubernetes and are the responsibility
|
|
|
|
of the cluster administrator.
|
|
|
|
|
|
|
|
In the ServiceSpec, `externalIPs` can be specified along with any of the `ServiceTypes`.
|
|
|
|
In the example below, my-service can be accessed by clients on 80.11.12.10:80 (externalIP:port)
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"kind": "Service",
|
|
|
|
"apiVersion": "v1",
|
|
|
|
"metadata": {
|
|
|
|
"name": "my-service"
|
|
|
|
},
|
|
|
|
"spec": {
|
|
|
|
"selector": {
|
|
|
|
"app": "MyApp"
|
|
|
|
},
|
|
|
|
"ports": [
|
|
|
|
{
|
|
|
|
"name": "http",
|
|
|
|
"protocol": "TCP",
|
|
|
|
"port": 80,
|
|
|
|
"targetPort": 9376
|
|
|
|
}
|
|
|
|
],
|
|
|
|
"externalIPs" : [
|
|
|
|
"80.11.12.10"
|
|
|
|
]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
## Shortcomings
|
|
|
|
|
2016-03-07 12:09:02 +00:00
|
|
|
Using the userspace proxy for VIPs will work at small to medium scale, but will
|
|
|
|
not scale to very large clusters with thousands of Services. See [the original
|
|
|
|
design proposal for portals](http://issue.k8s.io/1107) for more details.
|
2016-02-26 11:54:48 +00:00
|
|
|
|
2016-03-07 12:09:02 +00:00
|
|
|
Using the userspace proxy obscures the source-IP of a packet accessing a `Service`.
|
|
|
|
This makes some kinds of firewalling impossible. The iptables proxier does not
|
|
|
|
obscure in-cluster source IPs, but it does still impact clients coming through
|
|
|
|
a load-balancer or node-port.
|
2016-02-26 11:54:48 +00:00
|
|
|
|
|
|
|
The `Type` field is designed as nested functionality - each level adds to the
|
|
|
|
previous. This is not strictly required on all cloud providers (e.g. Google Compute Engine does
|
|
|
|
not need to allocate a `NodePort` to make `LoadBalancer` work, but AWS does)
|
|
|
|
but the current API requires it.
|
|
|
|
|
|
|
|
## Future work
|
|
|
|
|
|
|
|
In the future we envision that the proxy policy can become more nuanced than
|
|
|
|
simple round robin balancing, for example master-elected or sharded. We also
|
|
|
|
envision that some `Services` will have "real" load balancers, in which case the
|
|
|
|
VIP will simply transport the packets there.
|
|
|
|
|
2016-03-07 12:09:02 +00:00
|
|
|
We intend to improve our support for L7 (HTTP) `Services`.
|
2016-02-26 11:54:48 +00:00
|
|
|
|
|
|
|
We intend to have more flexible ingress modes for `Services` which encompass
|
|
|
|
the current `ClusterIP`, `NodePort`, and `LoadBalancer` modes and more.
|
|
|
|
|
|
|
|
## The gory details of virtual IPs
|
|
|
|
|
|
|
|
The previous information should be sufficient for many people who just want to
|
|
|
|
use `Services`. However, there is a lot going on behind the scenes that may be
|
|
|
|
worth understanding.
|
|
|
|
|
|
|
|
### Avoiding collisions
|
|
|
|
|
|
|
|
One of the primary philosophies of Kubernetes is that users should not be
|
|
|
|
exposed to situations that could cause their actions to fail through no fault
|
|
|
|
of their own. In this situation, we are looking at network ports - users
|
|
|
|
should not have to choose a port number if that choice might collide with
|
|
|
|
another user. That is an isolation failure.
|
|
|
|
|
|
|
|
In order to allow users to choose a port number for their `Services`, we must
|
|
|
|
ensure that no two `Services` can collide. We do that by allocating each
|
|
|
|
`Service` its own IP address.
|
|
|
|
|
|
|
|
To ensure each service receives a unique IP, an internal allocator atomically
|
|
|
|
updates a global allocation map in etcd prior to each service. The map object
|
|
|
|
must exist in the registry for services to get IPs, otherwise creations will
|
|
|
|
fail with a message indicating an IP could not be allocated. A background
|
|
|
|
controller is responsible for creating that map (to migrate from older versions
|
|
|
|
of Kubernetes that used in memory locking) as well as checking for invalid
|
|
|
|
assignments due to administrator intervention and cleaning up any IPs
|
|
|
|
that were allocated but which no service currently uses.
|
|
|
|
|
|
|
|
### IPs and VIPs
|
|
|
|
|
|
|
|
Unlike `Pod` IP addresses, which actually route to a fixed destination,
|
|
|
|
`Service` IPs are not actually answered by a single host. Instead, we use
|
|
|
|
`iptables` (packet processing logic in Linux) to define virtual IP addresses
|
|
|
|
which are transparently redirected as needed. When clients connect to the
|
|
|
|
VIP, their traffic is automatically transported to an appropriate endpoint.
|
|
|
|
The environment variables and DNS for `Services` are actually populated in
|
|
|
|
terms of the `Service`'s VIP and port.
|
|
|
|
|
2016-03-07 12:09:02 +00:00
|
|
|
We support two proxy modes - userspace and iptables, which operate slightly
|
|
|
|
differently.
|
|
|
|
|
|
|
|
#### Userspace
|
|
|
|
|
2016-02-26 11:54:48 +00:00
|
|
|
As an example, consider the image processing application described above.
|
|
|
|
When the backend `Service` is created, the Kubernetes master assigns a virtual
|
|
|
|
IP address, for example 10.0.0.1. Assuming the `Service` port is 1234, the
|
|
|
|
`Service` is observed by all of the `kube-proxy` instances in the cluster.
|
|
|
|
When a proxy sees a new `Service`, it opens a new random port, establishes an
|
|
|
|
iptables redirect from the VIP to this new port, and starts accepting
|
|
|
|
connections on it.
|
|
|
|
|
|
|
|
When a client connects to the VIP the iptables rule kicks in, and redirects
|
|
|
|
the packets to the `Service proxy`'s own port. The `Service proxy` chooses a
|
|
|
|
backend, and starts proxying traffic from the client to the backend.
|
|
|
|
|
|
|
|
This means that `Service` owners can choose any port they want without risk of
|
|
|
|
collision. Clients can simply connect to an IP and port, without being aware
|
|
|
|
of which `Pods` they are actually accessing.
|
|
|
|
|
2016-03-07 12:09:02 +00:00
|
|
|
#### Iptables
|
|
|
|
|
|
|
|
Again, consider the image processing application described above.
|
|
|
|
When the backend `Service` is created, the Kubernetes master assigns a virtual
|
|
|
|
IP address, for example 10.0.0.1. Assuming the `Service` port is 1234, the
|
|
|
|
`Service` is observed by all of the `kube-proxy` instances in the cluster.
|
|
|
|
When a proxy sees a new `Service`, it installs a series of iptables rules which
|
|
|
|
redirect from the VIP to per-`Service` rules. The per-`Service` rules link to
|
|
|
|
per-`Endpoint` rules which redirect (Destination NAT) to the backends.
|
|
|
|
|
|
|
|
When a client connects to the VIP the iptables rule kicks in. A backend is
|
|
|
|
chosen (either based on session affinity or randomly) and packets are
|
|
|
|
redirected to the backend. Unlike the userspace proxy, packets are never
|
|
|
|
copied to userspace, the kube-proxy does not have to be running for the VIP to
|
|
|
|
work, and the client IP is not altered.
|
|
|
|
|
|
|
|
This same basic flow executes when traffic comes in through a node-port or
|
|
|
|
through a load-balancer, though in those cases the client IP does get altered.
|
2016-02-26 11:54:48 +00:00
|
|
|
|
|
|
|
## API Object
|
|
|
|
|
|
|
|
Service is a top-level resource in the kubernetes REST API. More details about the
|
|
|
|
API object can be found at: [Service API
|
2016-03-07 12:09:02 +00:00
|
|
|
object](/docs/api-reference/v1/definitions/#_v1_service).
|
2016-03-16 22:54:34 +00:00
|
|
|
|
|
|
|
## For More Information
|
|
|
|
|
|
|
|
Read [Service Operations](/docs/user-guide/services/operations/).
|