Update trouble shooting to include the issue of etcd upgrade

For the isse which is reported recently: https://github.com/kubernetes/kubeadm/issues/2957
We'd better to provide some tips to workaround this known issue.

Signed-off-by: Dave Chen <dave.chen@arm.com>
Dave Chen 2023-11-13 16:00:59 +08:00 committed by Dave Chen
parent 0082100f78
commit 6c6ace0a33
1 changed files with 79 additions and 0 deletions

View File

@ -431,3 +431,82 @@ See [Enabling signed kubelet serving certificates](/docs/tasks/administer-cluste
to understand how to configure the kubelets in a kubeadm cluster to have properly signed serving certificates.
Also see [How to run the metrics-server securely](https://github.com/kubernetes-sigs/metrics-server/blob/master/FAQ.md#how-to-run-metrics-server-securely).
## Upgrade fails due to etcd hash not changing
Only applicable to upgrading a control plane node with a kubeadm binary v1.28.3 or later,
where the node is currently managed by kubeadm versions v1.28.0, v1.28.1 or v1.28.2.
Here is the error message you may encounter:
[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
[upgrade/etcd] Waiting for previous etcd to become available
I0907 10:10:09.109104 3704 etcd.go:588] [etcd] attempting to see if all cluster endpoints ([]) are available 1/10
[upgrade/etcd] Etcd was rolled back and is now available
static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced
The reason for this failure is that the affected versions generate an etcd manifest file with unwanted defaults in the PodSpec.
This will result in a diff from the manifest comparison, and kubeadm will expect a change in the Pod hash, but the kubelet will never update the hash.
There are two way to workaround this issue if you see it in your cluster:
- The etcd upgrade can be skipped between the affected versions and v1.28.3 (or later) by using:
kubeadm upgrade {apply|node} [version] --etcd-upgrade=false
This is not recommended in case a new etcd version was introduced by a later v1.28 patch version.
- Before upgrade, patch the manifest for the etcd static pod, to remove the problematic defaulted attributes:
diff --git a/etc/kubernetes/manifests/etcd_defaults.yaml b/etc/kubernetes/manifests/etcd_origin.yaml
index d807ccbe0aa..46b35f00e15 100644
--- a/etc/kubernetes/manifests/etcd_defaults.yaml
+++ b/etc/kubernetes/manifests/etcd_origin.yaml
@@ -43,7 +43,6 @@ spec:
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
- successThreshold: 1
timeoutSeconds: 15
name: etcd
@@ -59,26 +58,18 @@ spec:
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
- successThreshold: 1
timeoutSeconds: 15
- terminationMessagePath: /dev/termination-log
- terminationMessagePolicy: File
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
- dnsPolicy: ClusterFirst
- enableServiceLinks: true
hostNetwork: true
priority: 2000001000
priorityClassName: system-node-critical
- restartPolicy: Always
- schedulerName: default-scheduler
type: RuntimeDefault
- terminationGracePeriodSeconds: 30
- hostPath:
path: /etc/kubernetes/pki/etcd
More information can be found in the [tracking issue](https://github.com/kubernetes/kubeadm/issues/2927) for this bug.