254 lines
12 KiB
Markdown
254 lines
12 KiB
Markdown
---
|
|
title: Troubleshooting kubeadm
|
|
content_template: templates/concept
|
|
weight: 90
|
|
---
|
|
|
|
{{% capture overview %}}
|
|
|
|
As with any program, you might run into an error installing or running kubeadm.
|
|
This page lists some common failure scenarios and have provided steps that can help you understand and fix the problem.
|
|
|
|
If your problem is not listed below, please follow the following steps:
|
|
|
|
- If you think your problem is a bug with kubeadm:
|
|
- Go to [github.com/kubernetes/kubeadm](https://github.com/kubernetes/kubeadm/issues) and search for existing issues.
|
|
- If no issue exists, please [open one](https://github.com/kubernetes/kubeadm/issues/new) and follow the issue template.
|
|
|
|
- If you are unsure about how kubeadm works, you can ask on [Slack](http://slack.k8s.io/) in #kubeadm, or open a question on [StackOverflow](https://stackoverflow.com/questions/tagged/kubernetes). Please include
|
|
relevant tags like `#kubernetes` and `#kubeadm` so folks can help you.
|
|
|
|
{{% /capture %}}
|
|
|
|
{{% capture body %}}
|
|
|
|
## `ebtables` or some similar executable not found during installation
|
|
|
|
If you see the following warnings while running `kubeadm init`
|
|
|
|
```sh
|
|
[preflight] WARNING: ebtables not found in system path
|
|
[preflight] WARNING: ethtool not found in system path
|
|
```
|
|
|
|
Then you may be missing `ebtables`, `ethtool` or a similar executable on your node. You can install them with the following commands:
|
|
|
|
- For Ubuntu/Debian users, run `apt install ebtables ethtool`.
|
|
- For CentOS/Fedora users, run `yum install ebtables ethtool`.
|
|
|
|
## kubeadm blocks waiting for control plane during installation
|
|
|
|
If you notice that `kubeadm init` hangs after printing out the following line:
|
|
|
|
```sh
|
|
[apiclient] Created API client, waiting for the control plane to become ready
|
|
```
|
|
|
|
This may be caused by a number of problems. The most common are:
|
|
|
|
- network connection problems. Check that your machine has full network connectivity before continuing.
|
|
- the default cgroup driver configuration for the kubelet differs from that used by Docker.
|
|
Check the system log file (e.g. `/var/log/message`) or examine the output from `journalctl -u kubelet`. If you see something like the following:
|
|
|
|
```shell
|
|
error: failed to run Kubelet: failed to create kubelet:
|
|
misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
|
|
```
|
|
|
|
There are two common ways to fix the cgroup driver problem:
|
|
|
|
1. Install Docker again following instructions
|
|
[here](/docs/setup/independent/install-kubeadm/#installing-docker).
|
|
1. Change the kubelet config to match the Docker cgroup driver manually, you can refer to
|
|
[Configure cgroup driver used by kubelet on Master Node](/docs/setup/independent/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-master-node)
|
|
for detailed instructions.
|
|
|
|
- control plane Docker containers are crashlooping or hanging. You can check this by running `docker ps` and investigating each container by running `docker logs`.
|
|
|
|
## kubeadm blocks when removing managed containers
|
|
|
|
The following could happen if Docker halts and does not remove any Kubernetes-managed containers:
|
|
|
|
```bash
|
|
sudo kubeadm reset
|
|
[preflight] Running pre-flight checks
|
|
[reset] Stopping the kubelet service
|
|
[reset] Unmounting mounted directories in "/var/lib/kubelet"
|
|
[reset] Removing kubernetes-managed containers
|
|
(block)
|
|
```
|
|
|
|
A possible solution is to restart the Docker service and then re-run `kubeadm reset`:
|
|
|
|
```bash
|
|
sudo systemctl restart docker.service
|
|
sudo kubeadm reset
|
|
```
|
|
|
|
Inspecting the logs for docker may also be useful:
|
|
|
|
```sh
|
|
journalctl -ul docker
|
|
```
|
|
|
|
## Pods in `RunContainerError`, `CrashLoopBackOff` or `Error` state
|
|
|
|
Right after `kubeadm init` there should not be any pods in these states.
|
|
|
|
- If there are pods in one of these states _right after_ `kubeadm init`, please open an
|
|
issue in the kubeadm repo. `coredns` (or `kube-dns`) should be in the `Pending` state
|
|
until you have deployed the network solution.
|
|
- If you see Pods in the `RunContainerError`, `CrashLoopBackOff` or `Error` state
|
|
after deploying the network solution and nothing happens to `coredns` (or `kube-dns`),
|
|
it's very likely that the Pod Network solution and nothing happens to the DNS server, it's very
|
|
likely that the Pod Network solution that you installed is somehow broken. You
|
|
might have to grant it more RBAC privileges or use a newer version. Please file
|
|
an issue in the Pod Network providers' issue tracker and get the issue triaged there.
|
|
- If you install a version of Docker older than 1.12.1, remove the `MountFlags=slave` option
|
|
when booting `dockerd` with `systemd` and restart `docker`. You can see the MountFlags in `/usr/lib/systemd/system/docker.service`.
|
|
MountFlags can interfere with volumes mounted by Kubernetes, and put the Pods in `CrashLoopBackOff` state.
|
|
The error happens when Kubernetes does not find `var/run/secrets/kubernetes.io/serviceaccount` files.
|
|
|
|
## `coredns` (or `kube-dns`) is stuck in the `Pending` state
|
|
|
|
This is **expected** and part of the design. kubeadm is network provider-agnostic, so the admin
|
|
should [install the pod network solution](/docs/concepts/cluster-administration/addons/)
|
|
of choice. You have to install a Pod Network
|
|
before CoreDNS may deployed fully. Hence the `Pending` state before the network is set up.
|
|
|
|
## `HostPort` services do not work
|
|
|
|
The `HostPort` and `HostIP` functionality is available depending on your Pod Network
|
|
provider. Please contact the author of the Pod Network solution to find out whether
|
|
`HostPort` and `HostIP` functionality are available.
|
|
|
|
Calico, Canal, and Flannel CNI providers are verified to support HostPort.
|
|
|
|
For more information, see the [CNI portmap documentation](https://github.com/containernetworking/plugins/blob/master/plugins/meta/portmap/README.md).
|
|
|
|
If your network provider does not support the portmap CNI plugin, you may need to use the [NodePort feature of
|
|
services](/docs/concepts/services-networking/service/#nodeport) or use `HostNetwork=true`.
|
|
|
|
## Pods are not accessible via their Service IP
|
|
|
|
- Many network add-ons do not yet enable [hairpin mode](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/#a-pod-cannot-reach-itself-via-service-ip)
|
|
which allows pods to access themselves via their Service IP. This is an issue related to
|
|
[CNI](https://github.com/containernetworking/cni/issues/476). Please contact the network
|
|
add-on provider to get the latest status of their support for hairpin mode.
|
|
|
|
- If you are using VirtualBox (directly or via Vagrant), you will need to
|
|
ensure that `hostname -i` returns a routable IP address. By default the first
|
|
interface is connected to a non-routable host-only network. A work around
|
|
is to modify `/etc/hosts`, see this [Vagrantfile](https://github.com/errordeveloper/k8s-playground/blob/22dd39dfc06111235620e6c4404a96ae146f26fd/Vagrantfile#L11)
|
|
for an example.
|
|
|
|
## TLS certificate errors
|
|
|
|
The following error indicates a possible certificate mismatch.
|
|
|
|
```none
|
|
# kubectl get pods
|
|
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
|
|
```
|
|
|
|
- Verify that the `$HOME/.kube/config` file contains a valid certificate, and
|
|
regenerate a certificate if necessary. The certificates in a kubeconfig file
|
|
are base64 encoded. The `base64 -d` command can be used to decode the certificate
|
|
and `openssl x509 -text -noout` can be used for viewing the certificate information.
|
|
- Another workaround is to overwrite the existing `kubeconfig` for the "admin" user:
|
|
|
|
```sh
|
|
mv $HOME/.kube $HOME/.kube.bak
|
|
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
|
|
sudo chown $(id -u):$(id -g) $HOME/.kube/config
|
|
```
|
|
|
|
## Default NIC When using flannel as the pod network in Vagrant
|
|
|
|
The following error might indicate that something was wrong in the pod network:
|
|
|
|
```sh
|
|
Error from server (NotFound): the server could not find the requested resource
|
|
```
|
|
|
|
- If you're using flannel as the pod network inside Vagrant, then you will have to specify the default interface name for flannel.
|
|
|
|
Vagrant typically assigns two interfaces to all VMs. The first, for which all hosts are assigned the IP address `10.0.2.15`, is for external traffic that gets NATed.
|
|
|
|
This may lead to problems with flannel, which defaults to the first interface on a host. This leads to all hosts thinking they have the same public IP address. To prevent this, pass the `--iface eth1` flag to flannel so that the second interface is chosen.
|
|
|
|
## Non-public IP used for containers
|
|
|
|
In some situations `kubectl logs` and `kubectl run` commands may return with the following errors in an otherwise functional cluster:
|
|
|
|
```sh
|
|
Error from server: Get https://10.19.0.41:10250/containerLogs/default/mysql-ddc65b868-glc5m/mysql: dial tcp 10.19.0.41:10250: getsockopt: no route to host
|
|
```
|
|
|
|
- This may be due to Kubernetes using an IP that can not communicate with other IPs on the seemingly same subnet, possibly by policy of the machine provider.
|
|
- Digital Ocean assigns a public IP to `eth0` as well as a private one to be used internally as anchor for their floating IP feature, yet `kubelet` will pick the latter as the node's `InternalIP` instead of the public one.
|
|
|
|
Use `ip addr show` to check for this scenario instead of `ifconfig` because `ifconfig` will not display the offending alias IP address. Alternatively an API endpoint specific to Digital Ocean allows to query for the anchor IP from the droplet:
|
|
|
|
```sh
|
|
curl http://169.254.169.254/metadata/v1/interfaces/public/0/anchor_ipv4/address
|
|
```
|
|
|
|
The workaround is to tell `kubelet` which IP to use using `--node-ip`. When using Digital Ocean, it can be the public one (assigned to `eth0`) or the private one (assigned to `eth1`) should you want to use the optional private network. The [`KubeletExtraArgs` section of the kubeadm `NodeRegistrationOptions` structure](https://github.com/kubernetes/kubernetes/blob/release-1.13/cmd/kubeadm/app/apis/kubeadm/v1beta1/types.go) can be used for this.
|
|
|
|
Then restart `kubelet`:
|
|
|
|
```sh
|
|
systemctl daemon-reload
|
|
systemctl restart kubelet
|
|
```
|
|
|
|
## `coredns` pods have `CrashLoopBackOff` or `Error` state
|
|
|
|
If you have nodes that are running SELinux with an older version of Docker you might experience a scenario
|
|
where the `coredns` pods are not starting. To solve that you can try one of the following options:
|
|
|
|
- Upgrade to a [newer version of Docker](/docs/setup/independent/install-kubeadm/#installing-docker).
|
|
- [Disable SELinux](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/security-enhanced_linux/sect-security-enhanced_linux-enabling_and_disabling_selinux-disabling_selinux).
|
|
- Modify the `coredns` deployment to set `allowPrivilegeEscalation` to `true`:
|
|
|
|
```bash
|
|
kubectl -n kube-system get deployment coredns -o yaml | \
|
|
sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | \
|
|
kubectl apply -f -
|
|
```
|
|
|
|
Another cause for CoreDNS to have `CrashLoopBackOff` is when a CoreDNS Pod deployed in Kubernetes detects a loop. [A number of workarounds](https://github.com/coredns/coredns/tree/master/plugin/loop#troubleshooting-loops-in-kubernetes-clusters)
|
|
are available to avoid Kubernetes trying to restart the CoreDNS Pod every time CoreDNS detects the loop and exits.
|
|
|
|
{{< warning >}}
|
|
Disabling SELinux or setting `allowPrivilegeEscalation` to `true` can compromise
|
|
the security of your cluster.
|
|
{{< /warning >}}
|
|
|
|
## etcd pods restart continually
|
|
|
|
If you encounter the following error:
|
|
|
|
```
|
|
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""
|
|
```
|
|
|
|
this issue appears if you run CentOS 7 with Docker 1.13.1.84.
|
|
This version of Docker can prevent the kubelet from executing into the etcd container.
|
|
|
|
To work around the issue, choose one of these options:
|
|
|
|
- Roll back to an earlier version of Docker, such as 1.13.1-75
|
|
```
|
|
yum downgrade docker-1.13.1-75.git8633870.el7.centos.x86_64 docker-client-1.13.1-75.git8633870.el7.centos.x86_64 docker-common-1.13.1-75.git8633870.el7.centos.x86_64
|
|
```
|
|
|
|
- Install one of the more recent recommended versions, such as 18.06:
|
|
```bash
|
|
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
|
|
yum install docker-ce-18.06.1.ce-3.el7.x86_64
|
|
```
|
|
|
|
{{% /capture %}}
|