From d5188c7e80ed224676c4ed8eede007a8151b8004 Mon Sep 17 00:00:00 2001 From: bprashanth Date: Thu, 1 Dec 2016 16:29:42 -0800 Subject: [PATCH] Write a source-ip tutorial. --- docs/tutorials/services/source-ip.md | 339 +++++++++++++++++++++++++++ 1 file changed, 339 insertions(+) create mode 100644 docs/tutorials/services/source-ip.md diff --git a/docs/tutorials/services/source-ip.md b/docs/tutorials/services/source-ip.md new file mode 100644 index 0000000000..6657e42720 --- /dev/null +++ b/docs/tutorials/services/source-ip.md @@ -0,0 +1,339 @@ +--- +--- + +{% capture overview %} + +Applications running in a Kubernetes cluster find and communicate with each +other, and the outside world, through the Service abstraction. This document +explains what happens to the source IP of packets sent to different types +of Services, and how you can toggle this behavior according to your needs. + +{% endcapture %} + +{% capture prerequisites %} + +{% include task-tutorial-prereqs.md %} + +### Terminology + +This document makes use of the following terms: + +* [NAT](https://en.wikipedia.org/wiki/Network_address_translation): network address translation +* [Source NAT](/docs/user-guide/services/#ips-and-vips): replacing the source IP on a packet, usually with a node's IP +* [Destination NAT](/docs/user-guide/services/#ips-and-vips): replacing the destination IP on a packet, usually with a pod IP +* [VIP](/docs/user-guide/services/#ips-and-vips): a virtual IP, such as the one assigned to every Kubernetes Service +* [Kube-proxy](/docs/user-guide/services/#virtual-ips-and-service-proxies): a network daemon that orchestrates Service VIP management on every node + + +### Prerequisites + +You must have a working Kubernetes 1.5 cluster to run the examples in this +document. The examples use a small nginx webserver that echoes back the source +IP of requests it receives through a HTTP header. You can create it as follows: + +```console +$ kubectl run source-ip-app --image=gcr.io/google_containers/echoserver:1.4 +deployment "source-ip-app" created +``` + +{% endcapture %} + +{% capture objectives %} + +* Expose a simple application through various types of Services +* Understand how each Service type handles source IP NAT +* Understand the tradeoffs involved in preserving source IP + +{% endcapture %} + + +{% capture lessoncontent %} + +### Source IP for Services with Type=ClusterIP + +Packets sent to ClusterIP from within the cluster are never source NAT'd if +you're running kube-proxy in [iptables mode](/docs/user-guide/services/#proxy-mode-iptables), +which is the default since Kubernetes 1.2. Kube-proxy exposes its mode through +a `proxyMode` endpoint: + +```console +$ kubectl get nodes +NAME STATUS AGE +kubernetes-minion-group-6jst Ready 2h +kubernetes-minion-group-cx31 Ready 2h +kubernetes-minion-group-jj1t Ready 2h + +kubernetes-minion-group-6jst $ curl localhost:10249/proxyMode +iptables +``` + +You can test source IP preservation by creating a Service over the source IP app: + +```console +$ kubectl expose deployment source-ip-app --name=clusterip --port=80 --target-port=8080 +service "clusterip" exposed + +$ kubectl get svc clusterip +NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE +clusterip 10.0.170.92 80/TCP 51s +``` + +And hitting the `ClusterIP` from a pod in the same cluster: + +```console +$ kubectl run busybox -it --image=busybox --restart=Never --rm +Waiting for pod default/busybox to be running, status is Pending, pod ready: false +If you don't see a command prompt, try pressing enter. + +# ip addr +1: lo: mtu 65536 qdisc noqueue + link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 + inet 127.0.0.1/8 scope host lo + valid_lft forever preferred_lft forever + inet6 ::1/128 scope host + valid_lft forever preferred_lft forever +3: eth0: mtu 1460 qdisc noqueue + link/ether 0a:58:0a:f4:03:08 brd ff:ff:ff:ff:ff:ff + inet 10.244.3.8/24 scope global eth0 + valid_lft forever preferred_lft forever + inet6 fe80::188a:84ff:feb0:26a5/64 scope link + valid_lft forever preferred_lft forever + +# wget -qO - 10.0.170.92 +CLIENT VALUES: +client_address=10.244.3.8 +command=GET +... +``` + +### Source IP for Services with Type=NodePort + +As of Kubernetes 1.5, packets sent to Services with [Type=NodePort](/docs/user-guide/services/#type-nodeport) +are source NAT'd by default. You can test this by creating a `NodePort` Service: + +```console +$ kubectl expose deployment source-ip-app --name=nodeport --port=80 --target-port=8080 --type=NodePort +service "nodeport" exposed + +$ NODEPORT=$(kubectl get -o jsonpath="{.spec.ports[0].nodePort}" services nodeport) +$ NODES=$(kubectl get nodes -o jsonpath='{ $.items[*].status.addresses[?(@.type=="ExternalIP")].address }') +``` + +if you're running on a cloudprovider, you may need to open up a firewall-rule +for the `nodes:nodeport` reported above. +Now you can try reaching the Service from outside the cluster through the node +port allocated above. + +```console +$ for node in $NODES; do curl -s $node:$NODEPORT | grep -i client_address; done +client_address=10.180.1.1 +client_address=10.240.0.5 +client_address=10.240.0.3 +``` + +Note that these are not your IPs, they're cluster internal IPs. This is what happens: + +* Client sends packet to `node2:nodePort` +* `node2` replaces the source IP address (SNAT) in the packet with its own IP address +* `node2` replaces the destination IP on the packet with the pod IP +* packet is routed to node 1, and then to the endpoint +* the pod's reply is routed back to node2 +* the pod's reply is sent back to the client + +Visually: + +``` + client + \ ^ + \ \ + v \ + node 1 <--- node 2 + | ^ SNAT + | | ---> + v | + endpoint +``` + + +To avoid this, Kubernetes 1.5 has a beta feature triggered by the +`service.beta.kubernetes.io/external-traffic` [annotation](/docs/user-guide/load-balancer/#loss-of-client-source-ip-for-external-traffic). +Setting it to the value `OnlyLocal` will only proxy requests to local endpoints, +never forwarding traffic to other nodes and thereby preserving the original +source IP address. If there are no local endpoints, packets sent to the node +are dropped, so you can rely on the correct source-ip in any packet processing +rules you might apply a packet that make it through to the endpoint. + +Set the annotation as follows: + +```console +$ kubectl annotate service nodeport service.beta.kubernetes.io/external-traffic=OnlyLocal +service "nodeport" annotated +``` + +Now, re-run the test: + +```console +$ for node in $NODES; do curl --connect-timeout 1 -s $node:$NODEPORT | grep -i client_address; do +client_address=104.132.1.79 +``` + +Note that you only got one reply, with the *right* client IP, from the one node on which the endpoint pod +is running on. + +This is what happens: + +* client sends packet to `node2:nodePort`, which doesn't have any endpoints +* packet is dropped +* client sends packet to `node1:nodePort`, which *does* have endpoints +* node1 routes packet to endpoint with the correct source IP + +Visually: + +``` + client + ^ / \ + / / \ + / v X + node 1 node 2 + ^ | + | | + | v + endpoint +``` + + + +### Source IP for Services with Type=LoadBalancer + +As of Kubernetes 1.5, packets sent to Services with [Type=LoadBalancer](/docs/user-guide/services/#type-loadbalancer) are +source NAT'd by default, because all schedulable Kubernetes nodes in the +`Ready` state are eligible for loadbalanced traffic. So if packets arrive +at a node without an endpoint, the system proxies it to a node *with* an +endpoint, replacing the source IP on the packet with the IP of the node (as +described in the previous section). + +You can test this by exposing the source-ip-app through a loadbalancer + +```console +$ kubectl expose deployment source-ip-app --name=loadbalancer --port=80 --target-port=8080 --type=LoadBalancer +service "loadbalancer" exposed + +$ kubectl get svc loadbalancer +NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE +loadbalancer 10.0.65.118 104.198.149.140 80/TCP 5m + +$ curl 104.198.149.140 +CLIENT VALUES: +client_address=10.240.0.5 +... +``` + +However, if you're running on GKE/GCE, setting the same `service.beta.kubernetes.io/external-traffic` +annotation to `OnlyLocal` forces nodes *without* Service endpoints to remove +themselves from the list of nodes eligible for loadbalanced traffic by +deliberately failing health checks. We expect to roll this feature out across a +wider range of providers before GA (see next section). + +Visually: + +``` + client + | + lb VIP + / ^ + v / +health check ---> node 1 node 2 <--- health check + 200 <--- ^ | ---> 500 + | V + endpoint +``` + +You can test this by setting the annotation: + +```console +$ kubectl annotate service loadbalancer service.beta.kubernetes.io/external-traffic=OnlyLocal +``` + +You should immediately see a second annotation allocated by Kubernetes: + +```console +$ kubectl get svc loadbalancer -o yaml | grep -i annotations -A 2 + annotations: + service.beta.kubernetes.io/external-traffic: OnlyLocal + service.beta.kubernetes.io/healthcheck-nodeport: "32122" +``` + +The `service.beta.kubernetes.io/healthcheck-nodeport` annotation points to +a port on every node serving the health check at `/healthz`. You can test this: + +``` +$ kubectl get po -o wide -l run=source-ip-app +NAME READY STATUS RESTARTS AGE IP NODE +source-ip-app-826191075-qehz4 1/1 Running 0 20h 10.180.1.136 kubernetes-minion-group-6jst + +kubernetes-minion-group-6jst $ curl localhost:32122/healthz +1 Service Endpoints found + +kubernetes-minion-group-jj1t $ curl localhost:32122/healthz +No Service Endpoints Found +``` + +A service controller running on the master is responsible for allocating the cloud +loadbalancer, and when it does so, it also allocates HTTP health checks +pointing to this port/path on each node. Wait about 10 seconds for the 2 nodes +without endpoints to fail health checks, then curl the lb ip: + +```console +$ curl 104.198.149.140 +CLIENT VALUES: +client_address=104.132.1.79 +... +``` + +__Cross platform support__ + +As of Kubernetes 1.5 support for source IP preservation through Services +with Type=LoadBalancer is only implemented in a subset of cloudproviders +(GCP and Azure). The cloudprovider you're running on might fulfill the +request for a loadbalancer in a few different ways: + +1. With a proxy that terminates the client connection and opens a new connection +to your nodes/endpoints. In such cases the source IP will always be that of the +cloud LB, not that of the client. + +2. With a packet forwarder, such that requests from the client sent to the +loadbalancer VIP end up at the node with the source IP of the client, not +an intermediate proxy. + +Loadbalancers in the first category must use an agreed upon +protocol between the loadbalancer and backend to communicate the true client IP +such as the HTTP [X-FORWARDED-FOR](https://en.wikipedia.org/wiki/X-Forwarded-For) +header, or the [proxy protocol](http://www.haproxy.org/download/1.5/doc/proxy-protocol.txt). +Loadbalancers in the second category can leverage the feature described above +by simply creating a HTTP health check pointing at the port stored in +the `service.beta.kubernetes.io/healthcheck-nodeport` annotation on the Service. + +{% endcapture %} + +{% capture cleanup %} + +Delete the Services: + +```console +$ kubectl delete svc -l run=source-ip-app +``` + +Delete the Deployment, ReplicaSet and Pod: + +```console +$ kubectl delete deployment source-ip-app +``` + +{% endcapture %} + +{% capture whatsnext %} +* Learn more about [connecting applications via services](/docs/user-guide/connecting-applications/) +* Learn more about [loadbalancing](/docs/user-guide/load-balancer) +{% endcapture %} + +{% include templates/tutorial.md %}