Whatever DNS provider you use, any DNS query tool (for example 'dig'
or 'nslookup') will of course also allow you to see the records
created by the Federation for you. Note that you should either point
these tools directly at your DNS provider (e.g. `dig
@ns-cloud-e1.googledomains.com...`) or expect delays in the order of
your configured TTL (180 seconds, by default) before seeing updates,
due to caching by intermediate DNS servers.
### Some notes about the above example
1. Notice that there is a normal ('A') record for each service shard that has at least one healthy backend endpoint. For example, in us-central1-a, 104.197.247.191 is the external IP address of the service shard in that zone, and in asia-east1-a the address is 130.211.56.221.
2. Similarly, there are regional 'A' records which include all healthy shards in that region. For example, 'us-central1'. These regional records are useful for clients which do not have a particular zone preference, and as a building block for the automated locality and failover mechanism described below.
2. For zones where there are currently no healthy backend endpoints, a CNAME ('Canonical Name') record is used to alias (automatically redirect) those queries to the next closest healthy zone. In the example, the service shard in us-central1-f currently has no healthy backend endpoints (i.e. Pods), so a CNAME record has been created to automatically redirect queries to other shards in that region (us-central1 in this case).
3. Similarly, if no healthy shards exist in the enclosing region, the search progresses further afield. In the europe-west1-d availability zone, there are no healthy backends, so queries are redirected to the broader europe-west1 region (which also has no healthy backends), and onward to the global set of healthy addresses (' nginx.mynamespace.myfederation.svc.example.com.')
The above set of DNS records is automatically kept in sync with the
current state of health of all service shards globally by the
Federated Service system. DNS resolver libraries (which are invoked by
all clients) automatically traverse the hierarchy of 'CNAME' and 'A'
records to return the correct set of healthy IP addresses. Clients can
then select any one of the returned addresses to initiate a network
connection (and fail over automatically to one of the other equivalent
addresses if required).
## Discovering a federated service
### From pods inside your federated clusters
By default, Kubernetes clusters come pre-configured with a
cluster-local DNS server ('KubeDNS'), as well as an intelligently
constructed DNS search path which together ensure that DNS queries
like "myservice", "myservice.mynamespace",
"bobsservice.othernamespace" etc issued by your software running
inside Pods are automatically expanded and resolved correctly to the
appropriate service IP of services running in the local cluster.
With the introduction of Federated Services and Cross-Cluster Service
Discovery, this concept is extended to cover Kubernetes services
running in any other cluster across your Cluster Federation, globally.
To take advantage of this extended range, you use a slightly different
DNS name (of the form "<servicename>.<namespace>.<federationname>",
e.g. myservice.mynamespace.myfederation) to resolve Federated
Services. Using a different DNS name also avoids having your existing
applications accidentally traversing cross-zone or cross-region
networks and you incurring perhaps unwanted network charges or
latency, without you explicitly opting in to this behavior.
So, using our NGINX example service above, and the Federated Service
DNS name form just described, let's consider an example: A Pod in a
cluster in the `us-central1-f` availability zone needs to contact our
NGINX service. Rather than use the service's traditional cluster-local
DNS name (```"nginx.mynamespace"```, which is automatically expanded
to ```"nginx.mynamespace.svc.cluster.local"```) it can now use the
service's Federated DNS name, which is
```"nginx.mynamespace.myfederation"```. This will be automatically
expanded and resolved to the closest healthy shard of my NGINX
service, wherever in the world that may be. If a healthy shard exists
in the local cluster, that service's cluster-local (typically
10.x.y.z) IP address will be returned (by the cluster-local KubeDNS).
This is almost exactly equivalent to non-federated service resolution
(almost because KubeDNS actually returns both a CNAME and an A record
for local federated services, but applications will be oblivious
to this minor technical difference).
But if the service does not exist in the local cluster (or it exists
but has no healthy backend pods), the DNS query is automatically
That way your clients can always use the short form on the left, and
always be automatically routed to the closest healthy shard on their
home continent. All of the required failover is handled for you
automatically by Kubernetes Cluster Federation. Future releases will
improve upon this even further.
## Handling failures of backend pods and whole clusters
Standard Kubernetes service cluster-IP's already ensure that
non-responsive individual Pod endpoints are automatically taken out of
service with low latency (a few seconds). In addition, as alluded
above, the Kubernetes Cluster Federation system automatically monitors
the health of clusters and the endpoints behind all of the shards of
your Federated Service, taking shards in and out of service as
required (e.g. when all of the endpoints behind a service, or perhaps
the entire cluster or availability zone go down, or conversely recover
from an outage). Due to the latency inherent in DNS caching (the cache
timeout, or TTL for Federated Service DNS records is configured to 3
minutes, by default, but can be adjusted), it may take up to that long
for all clients to completely fail over to an alternative cluster in
the case of catastrophic failure. However, given the number of
discrete IP addresses which can be returned for each regional service
endpoint (see e.g. us-central1 above, which has three alternatives)
many clients will fail over automatically to one of the alternative
IP's in less time than that given appropriate configuration.
## Troubleshooting
#### I cannot connect to my cluster federation API
Check that your
1. Client (typically kubectl) is correctly configured (including API endpoints and login credentials), and
2. Cluster Federation API server is running and network-reachable.
See the [federation admin guide](/docs/admin/federation/) to learn
how to bring up a cluster federation correctly (or have your cluster administrator do this for you), and how to correctly configure your client.
#### I can create a federated service successfully against the cluster federation API, but no matching services are created in my underlying clusters
Check that:
1. Your clusters are correctly registered in the Cluster Federation API (`kubectl describe clusters`)
2. Your clusters are all 'Active'. This means that the cluster Federation system was able to connect and authenticate against the clusters' endpoints. If not, consult the logs of the federation-controller-manager pod to ascertain what the failure might be. (`kubectl --namespace=federation logs $(kubectl get pods --namespace=federation -l module=federation-controller-manager -oname`)
3. That the login credentials provided to the Cluster Federation API for the clusters have the correct authorization and quota to create services in the relevant namespace in the clusters. Again you should see associated error messages providing more detail in the above log file if this is not the case.
4. Whether any other error is preventing the service creation operation from succeeding (look for `service-controller` errors in the output of `kubectl logs federation-controller-manager --namespace federation`).
#### I can create a federated service successfully, but no matching DNS records are created in my DNS provider.
Check that:
1. Your federation name, DNS provider, DNS domain name are configured correctly. Consult the [federation admin guide](/docs/admin/federation/) or [tutorial](https://github.com/kelseyhightower/kubernetes-cluster-federation) to learn
how to configure your Cluster Federation system's DNS provider (or have your cluster administrator do this for you).
2. Confirm that the Cluster Federation's service-controller is successfully connecting to and authenticating against your selected DNS provider (look for `service-controller` errors or successes in the output of `kubectl logs federation-controller-manager --namespace federation`)
3. Confirm that the Cluster Federation's service-controller is successfully creating DNS records in your DNS provider (or outputting errors in its logs explaining in more detail what's failing).
#### Matching DNS records are created in my DNS provider, but clients are unable to resolve against those names
Check that:
1. The DNS registrar that manages your federation DNS domain has been correctly configured to point to your configured DNS provider's nameservers. See for example [Google Domains Documentation](https://support.google.com/domains/answer/3290309?hl=en&ref_topic=3251230) and [Google Cloud DNS Documentation](https://cloud.google.com/dns/update-name-servers), or equivalent guidance from your domain registrar and DNS provider.
#### This troubleshooting guide did not help me solve my problem