106 lines
3.8 KiB
Markdown
106 lines
3.8 KiB
Markdown
---
|
|
---
|
|
|
|
* TOC
|
|
{:toc}
|
|
|
|
## Debugging pods
|
|
|
|
The first step in debugging a pod is taking a look at it. Check the current
|
|
state of the pod and recent events with the following command:
|
|
|
|
$ kubectl describe pods ${POD_NAME}
|
|
|
|
Look at the state of the containers in the pod. Are they all `Running`? Have
|
|
there been recent restarts?
|
|
|
|
Continue debugging depending on the state of the pods.
|
|
|
|
### My pod stays pending
|
|
|
|
If a pod is stuck in `Pending` it means that it can not be scheduled onto a
|
|
node. Generally this is because there are insufficient resources of one type or
|
|
another that prevent scheduling. Look at the output of the `kubectl describe
|
|
...` command above. There should be messages from the scheduler about why it
|
|
can not schedule your pod. Reasons include:
|
|
|
|
#### Insufficient resources
|
|
|
|
You may have exhausted the supply of CPU or Memory in your cluster. In this
|
|
case you can try several things:
|
|
|
|
* [Add more nodes](/docs/admin/cluster-management/#resizing-a-cluster) to the cluster.
|
|
|
|
* [Terminate unneeded pods](/docs/user-guide/pods/single-container/#deleting_a_pod)
|
|
to make room for pending pods.
|
|
|
|
* Check that the pod is not larger than your nodes. For example, if all
|
|
nodes have a capacity of `cpu:1`, then a pod with a limit of `cpu: 1.1`
|
|
will never be scheduled.
|
|
|
|
You can check node capacities with the `kubectl get nodes -o <format>`
|
|
command. Here are some example command lines that extract just the necessary
|
|
information:
|
|
|
|
kubectl get nodes -o yaml | grep '\sname\|cpu\|memory'
|
|
kubectl get nodes -o json | jq '.items[] | {name: .metadata.name, cap: .status.capacity}'
|
|
|
|
The [resource quota](/docs/admin/resourcequota/)
|
|
feature can be configured to limit the total amount of
|
|
resources that can be consumed. If used in conjunction with namespaces, it can
|
|
prevent one team from hogging all the resources.
|
|
|
|
#### Using hostPort
|
|
|
|
When you bind a pod to a `hostPort` there are a limited number of places that
|
|
the pod can be scheduled. In most cases, `hostPort` is unnecessary; try using a
|
|
service object to expose your pod. If you do require `hostPort` then you can
|
|
only schedule as many pods as there are nodes in your container cluster.
|
|
|
|
### My pod stays waiting
|
|
|
|
If a pod is stuck in the `Waiting` state, then it has been scheduled to a
|
|
worker node, but it can't run on that machine. Again, the information from
|
|
`kubectl describe ...` should be informative. The most common cause of
|
|
`Waiting` pods is a failure to pull the image. There are three things to check:
|
|
|
|
* Make sure that you have the name of the image correct.
|
|
* Have you pushed the image to the repository?
|
|
* Run a manual `docker pull <image>` on your machine to see if the image can be
|
|
pulled.
|
|
|
|
### My pod is crashing or otherwise unhealthy
|
|
|
|
First, take a look at the logs of the current container:
|
|
|
|
$ kubectl logs ${POD_NAME} ${CONTAINER_NAME}
|
|
|
|
If your container has previously crashed, you can access the previous
|
|
container's crash log with:
|
|
|
|
$ kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}
|
|
|
|
Alternately, you can run commands inside that container with `exec`:
|
|
|
|
$ kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}
|
|
|
|
Note that `-c ${CONTAINER_NAME}` is optional and can be omitted for pods that
|
|
only contain a single container.
|
|
|
|
As an example, to look at the logs from a running Cassandra pod, you might run:
|
|
|
|
$ kubectl exec cassandra -- cat /var/log/cassandra/system.log
|
|
|
|
If none of these approaches work, you can find the host machine that the pod is
|
|
running on and SSH into that host.
|
|
|
|
## Debugging Replication Controllers
|
|
|
|
Replication controllers are fairly straightforward. They can either create pods
|
|
or they can't. If they can't create pods, then please refer to the
|
|
[instructions above](#debugging_pods) to debug your pods.
|
|
|
|
You can also use `kubectl describe rc ${CONTROLLER_NAME}` to inspect events
|
|
related to the replication controller.
|
|
|