341 lines
11 KiB
Markdown
341 lines
11 KiB
Markdown
---
|
|
title: Using Source IP
|
|
---
|
|
|
|
{% capture overview %}
|
|
|
|
Applications running in a Kubernetes cluster find and communicate with each
|
|
other, and the outside world, through the Service abstraction. This document
|
|
explains what happens to the source IP of packets sent to different types
|
|
of Services, and how you can toggle this behavior according to your needs.
|
|
|
|
{% endcapture %}
|
|
|
|
{% capture prerequisites %}
|
|
|
|
{% include task-tutorial-prereqs.md %}
|
|
|
|
## Terminology
|
|
|
|
This document makes use of the following terms:
|
|
|
|
* [NAT](https://en.wikipedia.org/wiki/Network_address_translation): network address translation
|
|
* [Source NAT](https://en.wikipedia.org/wiki/Network_address_translation#SNAT): replacing the source IP on a packet, usually with a node's IP
|
|
* [Destination NAT](https://en.wikipedia.org/wiki/Network_address_translation#DNAT): replacing the destination IP on a packet, usually with a pod IP
|
|
* [VIP](/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies): a virtual IP, such as the one assigned to every Kubernetes Service
|
|
* [Kube-proxy](/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies): a network daemon that orchestrates Service VIP management on every node
|
|
|
|
|
|
## Prerequisites
|
|
|
|
You must have a working Kubernetes 1.5 cluster to run the examples in this
|
|
document. The examples use a small nginx webserver that echoes back the source
|
|
IP of requests it receives through an HTTP header. You can create it as follows:
|
|
|
|
```console
|
|
$ kubectl run source-ip-app --image=gcr.io/google_containers/echoserver:1.4
|
|
deployment "source-ip-app" created
|
|
```
|
|
|
|
{% endcapture %}
|
|
|
|
{% capture objectives %}
|
|
|
|
* Expose a simple application through various types of Services
|
|
* Understand how each Service type handles source IP NAT
|
|
* Understand the tradeoffs involved in preserving source IP
|
|
|
|
{% endcapture %}
|
|
|
|
|
|
{% capture lessoncontent %}
|
|
|
|
## Source IP for Services with Type=ClusterIP
|
|
|
|
Packets sent to ClusterIP from within the cluster are never source NAT'd if
|
|
you're running kube-proxy in [iptables mode](/docs/user-guide/services/#proxy-mode-iptables),
|
|
which is the default since Kubernetes 1.2. Kube-proxy exposes its mode through
|
|
a `proxyMode` endpoint:
|
|
|
|
```console
|
|
$ kubectl get nodes
|
|
NAME STATUS AGE VERSION
|
|
kubernetes-minion-group-6jst Ready 2h v1.6.0+fff5156
|
|
kubernetes-minion-group-cx31 Ready 2h v1.6.0+fff5156
|
|
kubernetes-minion-group-jj1t Ready 2h v1.6.0+fff5156
|
|
|
|
kubernetes-minion-group-6jst $ curl localhost:10249/proxyMode
|
|
iptables
|
|
```
|
|
|
|
You can test source IP preservation by creating a Service over the source IP app:
|
|
|
|
```console
|
|
$ kubectl expose deployment source-ip-app --name=clusterip --port=80 --target-port=8080
|
|
service "clusterip" exposed
|
|
|
|
$ kubectl get svc clusterip
|
|
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
|
clusterip 10.0.170.92 <none> 80/TCP 51s
|
|
```
|
|
|
|
And hitting the `ClusterIP` from a pod in the same cluster:
|
|
|
|
```console
|
|
$ kubectl run busybox -it --image=busybox --restart=Never --rm
|
|
Waiting for pod default/busybox to be running, status is Pending, pod ready: false
|
|
If you don't see a command prompt, try pressing enter.
|
|
|
|
# ip addr
|
|
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
|
|
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
|
|
inet 127.0.0.1/8 scope host lo
|
|
valid_lft forever preferred_lft forever
|
|
inet6 ::1/128 scope host
|
|
valid_lft forever preferred_lft forever
|
|
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc noqueue
|
|
link/ether 0a:58:0a:f4:03:08 brd ff:ff:ff:ff:ff:ff
|
|
inet 10.244.3.8/24 scope global eth0
|
|
valid_lft forever preferred_lft forever
|
|
inet6 fe80::188a:84ff:feb0:26a5/64 scope link
|
|
valid_lft forever preferred_lft forever
|
|
|
|
# wget -qO - 10.0.170.92
|
|
CLIENT VALUES:
|
|
client_address=10.244.3.8
|
|
command=GET
|
|
...
|
|
```
|
|
If the client pod and server pod are in the same node, the client_address is the client pod's IP address. However, if the client pod and server pod are in different nodes, the client_address is the client pod's node flannel IP address.
|
|
|
|
## Source IP for Services with Type=NodePort
|
|
|
|
As of Kubernetes 1.5, packets sent to Services with [Type=NodePort](/docs/user-guide/services/#type-nodeport)
|
|
are source NAT'd by default. You can test this by creating a `NodePort` Service:
|
|
|
|
```console
|
|
$ kubectl expose deployment source-ip-app --name=nodeport --port=80 --target-port=8080 --type=NodePort
|
|
service "nodeport" exposed
|
|
|
|
$ NODEPORT=$(kubectl get -o jsonpath="{.spec.ports[0].nodePort}" services nodeport)
|
|
$ NODES=$(kubectl get nodes -o jsonpath='{ $.items[*].status.addresses[?(@.type=="ExternalIP")].address }')
|
|
```
|
|
|
|
If you're running on a cloudprovider, you may need to open up a firewall-rule
|
|
for the `nodes:nodeport` reported above.
|
|
Now you can try reaching the Service from outside the cluster through the node
|
|
port allocated above.
|
|
|
|
```console
|
|
$ for node in $NODES; do curl -s $node:$NODEPORT | grep -i client_address; done
|
|
client_address=10.180.1.1
|
|
client_address=10.240.0.5
|
|
client_address=10.240.0.3
|
|
```
|
|
|
|
Note that these are not the correct client IPs, they're cluster internal IPs. This is what happens:
|
|
|
|
* Client sends packet to `node2:nodePort`
|
|
* `node2` replaces the source IP address (SNAT) in the packet with its own IP address
|
|
* `node2` replaces the destination IP on the packet with the pod IP
|
|
* packet is routed to node 1, and then to the endpoint
|
|
* the pod's reply is routed back to node2
|
|
* the pod's reply is sent back to the client
|
|
|
|
Visually:
|
|
|
|
```
|
|
client
|
|
\ ^
|
|
\ \
|
|
v \
|
|
node 1 <--- node 2
|
|
| ^ SNAT
|
|
| | --->
|
|
v |
|
|
endpoint
|
|
```
|
|
|
|
|
|
To avoid this, Kubernetes has a feature to preserve the client source IP
|
|
[(check here for feature availability)](/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip).
|
|
Setting `service.spec.externalTrafficPolicy` to the value `Local` will only
|
|
proxy requests to local endpoints, never forwarding traffic to other nodes
|
|
and thereby preserving the original source IP address. If there are no
|
|
local endpoints, packets sent to the node are dropped, so you can rely
|
|
on the correct source-ip in any packet processing rules you might apply a
|
|
packet that make it through to the endpoint.
|
|
|
|
Set the `service.spec.externalTrafficPolicy` field as follows:
|
|
|
|
```console
|
|
$ kubectl patch svc nodeport -p '{"spec":{"externalTrafficPolicy":"Local"}}'
|
|
service "nodeport" patched
|
|
```
|
|
|
|
Now, re-run the test:
|
|
|
|
```console
|
|
$ for node in $NODES; do curl --connect-timeout 1 -s $node:$NODEPORT | grep -i client_address; done
|
|
client_address=104.132.1.79
|
|
```
|
|
|
|
Note that you only got one reply, with the *right* client IP, from the one node on which the endpoint pod
|
|
is running.
|
|
|
|
This is what happens:
|
|
|
|
* client sends packet to `node2:nodePort`, which doesn't have any endpoints
|
|
* packet is dropped
|
|
* client sends packet to `node1:nodePort`, which *does* have endpoints
|
|
* node1 routes packet to endpoint with the correct source IP
|
|
|
|
Visually:
|
|
|
|
```
|
|
client
|
|
^ / \
|
|
/ / \
|
|
/ v X
|
|
node 1 node 2
|
|
^ |
|
|
| |
|
|
| v
|
|
endpoint
|
|
```
|
|
|
|
|
|
|
|
## Source IP for Services with Type=LoadBalancer
|
|
|
|
As of Kubernetes 1.5, packets sent to Services with [Type=LoadBalancer](/docs/user-guide/services/#type-loadbalancer) are
|
|
source NAT'd by default, because all schedulable Kubernetes nodes in the
|
|
`Ready` state are eligible for loadbalanced traffic. So if packets arrive
|
|
at a node without an endpoint, the system proxies it to a node *with* an
|
|
endpoint, replacing the source IP on the packet with the IP of the node (as
|
|
described in the previous section).
|
|
|
|
You can test this by exposing the source-ip-app through a loadbalancer
|
|
|
|
```console
|
|
$ kubectl expose deployment source-ip-app --name=loadbalancer --port=80 --target-port=8080 --type=LoadBalancer
|
|
service "loadbalancer" exposed
|
|
|
|
$ kubectl get svc loadbalancer
|
|
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
|
loadbalancer 10.0.65.118 104.198.149.140 80/TCP 5m
|
|
|
|
$ curl 104.198.149.140
|
|
CLIENT VALUES:
|
|
client_address=10.240.0.5
|
|
...
|
|
```
|
|
|
|
However, if you're running on GKE/GCE, setting the same `service.spec.externalTrafficPolicy`
|
|
field to `Local` forces nodes *without* Service endpoints to remove
|
|
themselves from the list of nodes eligible for loadbalanced traffic by
|
|
deliberately failing health checks.
|
|
|
|
Visually:
|
|
|
|
```
|
|
client
|
|
|
|
|
lb VIP
|
|
/ ^
|
|
v /
|
|
health check ---> node 1 node 2 <--- health check
|
|
200 <--- ^ | ---> 500
|
|
| V
|
|
endpoint
|
|
```
|
|
|
|
You can test this by setting the annotation:
|
|
|
|
```console
|
|
$ kubectl patch svc loadbalancer -p '{"spec":{"externalTrafficPolicy":"Local"}}'
|
|
```
|
|
|
|
You should immediately see the `service.spec.healthCheckNodePort` field allocated
|
|
by Kubernetes:
|
|
|
|
```console
|
|
$ kubectl get svc loadbalancer -o yaml | grep -i healthCheckNodePort
|
|
healthCheckNodePort: 32122
|
|
```
|
|
|
|
The `service.spec.healthCheckNodePort` field points to a port on every node
|
|
serving the health check at `/healthz`. You can test this:
|
|
|
|
```
|
|
$ kubectl get pod -o wide -l run=source-ip-app
|
|
NAME READY STATUS RESTARTS AGE IP NODE
|
|
source-ip-app-826191075-qehz4 1/1 Running 0 20h 10.180.1.136 kubernetes-minion-group-6jst
|
|
|
|
kubernetes-minion-group-6jst $ curl localhost:32122/healthz
|
|
1 Service Endpoints found
|
|
|
|
kubernetes-minion-group-jj1t $ curl localhost:32122/healthz
|
|
No Service Endpoints Found
|
|
```
|
|
|
|
A service controller running on the master is responsible for allocating the cloud
|
|
loadbalancer, and when it does so, it also allocates HTTP health checks
|
|
pointing to this port/path on each node. Wait about 10 seconds for the 2 nodes
|
|
without endpoints to fail health checks, then curl the lb ip:
|
|
|
|
```console
|
|
$ curl 104.198.149.140
|
|
CLIENT VALUES:
|
|
client_address=104.132.1.79
|
|
...
|
|
```
|
|
|
|
__Cross platform support__
|
|
|
|
As of Kubernetes 1.5, support for source IP preservation through Services
|
|
with Type=LoadBalancer is only implemented in a subset of cloudproviders
|
|
(GCP and Azure). The cloudprovider you're running on might fulfill the
|
|
request for a loadbalancer in a few different ways:
|
|
|
|
1. With a proxy that terminates the client connection and opens a new connection
|
|
to your nodes/endpoints. In such cases the source IP will always be that of the
|
|
cloud LB, not that of the client.
|
|
|
|
2. With a packet forwarder, such that requests from the client sent to the
|
|
loadbalancer VIP end up at the node with the source IP of the client, not
|
|
an intermediate proxy.
|
|
|
|
Loadbalancers in the first category must use an agreed upon
|
|
protocol between the loadbalancer and backend to communicate the true client IP
|
|
such as the HTTP [X-FORWARDED-FOR](https://en.wikipedia.org/wiki/X-Forwarded-For)
|
|
header, or the [proxy protocol](http://www.haproxy.org/download/1.5/doc/proxy-protocol.txt).
|
|
Loadbalancers in the second category can leverage the feature described above
|
|
by simply creating an HTTP health check pointing at the port stored in
|
|
the `service.spec.healthCheckNodePort` field on the Service.
|
|
|
|
{% endcapture %}
|
|
|
|
{% capture cleanup %}
|
|
|
|
Delete the Services:
|
|
|
|
```console
|
|
$ kubectl delete svc -l run=source-ip-app
|
|
```
|
|
|
|
Delete the Deployment, ReplicaSet and Pod:
|
|
|
|
```console
|
|
$ kubectl delete deployment source-ip-app
|
|
```
|
|
|
|
{% endcapture %}
|
|
|
|
{% capture whatsnext %}
|
|
* Learn more about [connecting applications via services](/docs/concepts/services-networking/connect-applications-service/)
|
|
* Learn more about [loadbalancing](/docs/user-guide/load-balancer)
|
|
{% endcapture %}
|
|
|
|
{% include templates/tutorial.md %}
|