diff --git a/content/zh-cn/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm.md b/content/zh-cn/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm.md index a9f88d7a45..94ecacdec2 100644 --- a/content/zh-cn/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm.md +++ b/content/zh-cn/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm.md @@ -767,3 +767,108 @@ Also see [How to run the metrics-server securely](https://github.com/kubernetes- 以进一步了解如何在 kubeadm 集群中配置 kubelet 使用正确签名了的服务证书。 另请参阅 [How to run the metrics-server securely](https://github.com/kubernetes-sigs/metrics-server/blob/master/FAQ.md#how-to-run-metrics-server-securely)。 + + +## 因 etcd 哈希值无变化而升级失败 {#upgrade-fails-due-to-etcd-hash-not-changing} + +仅适用于通过 kubeadm 二进制文件 v1.28.3 或更高版本升级控制平面节点的情况, +其中此节点当前由 kubeadm v1.28.0、v1.28.1 或 v1.28.2 管理。 + +以下是你可能遇到的错误消息: + +``` +[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition +[upgrade/etcd] Waiting for previous etcd to become available +I0907 10:10:09.109104 3704 etcd.go:588] [etcd] attempting to see if all cluster endpoints ([https://172.17.0.6:2379/ https://172.17.0.4:2379/ https://172.17.0.3:2379/]) are available 1/10 +[upgrade/etcd] Etcd was rolled back and is now available +static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition +couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced +k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.rollbackOldManifests + cmd/kubeadm/app/phases/upgrade/staticpods.go:525 +k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.upgradeComponent + cmd/kubeadm/app/phases/upgrade/staticpods.go:254 +k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.performEtcdStaticPodUpgrade + cmd/kubeadm/app/phases/upgrade/staticpods.go:338 +... +``` + + +本次失败的原因是受影响的版本在 PodSpec 中生成的 etcd 清单文件带有不需要的默认值。 +这将导致与清单比较的差异,并且 kubeadm 预期 Pod 哈希值将发生变化,但 kubelet 永远不会更新哈希值。 + +如果你在集群中遇到此问题,有两种解决方法: + +- 可以运行以下命令跳过 etcd 的版本升级,即受影响版本和 v1.28.3(或更高版本)之间的版本升级: + +```shell +kubeadm upgrade {apply|node} [version] --etcd-upgrade=false +``` + + +但不推荐这种方法,因为后续的 v1.28 补丁版本可能引入新的 etcd 版本。 + +- 在升级之前,对 etcd 静态 Pod 的清单进行修补,以删除有问题的默认属性: + + ```patch + diff --git a/etc/kubernetes/manifests/etcd_defaults.yaml b/etc/kubernetes/manifests/etcd_origin.yaml + index d807ccbe0aa..46b35f00e15 100644 + --- a/etc/kubernetes/manifests/etcd_defaults.yaml + +++ b/etc/kubernetes/manifests/etcd_origin.yaml + @@ -43,7 +43,6 @@ spec: + scheme: HTTP + initialDelaySeconds: 10 + periodSeconds: 10 + - successThreshold: 1 + timeoutSeconds: 15 + name: etcd + resources: + @@ -59,26 +58,18 @@ spec: + scheme: HTTP + initialDelaySeconds: 10 + periodSeconds: 10 + - successThreshold: 1 + timeoutSeconds: 15 + - terminationMessagePath: /dev/termination-log + - terminationMessagePolicy: File + volumeMounts: + - mountPath: /var/lib/etcd + name: etcd-data + - mountPath: /etc/kubernetes/pki/etcd + name: etcd-certs + - dnsPolicy: ClusterFirst + - enableServiceLinks: true + hostNetwork: true + priority: 2000001000 + priorityClassName: system-node-critical + - restartPolicy: Always + - schedulerName: default-scheduler + securityContext: + seccompProfile: + type: RuntimeDefault + - terminationGracePeriodSeconds: 30 + volumes: + - hostPath: + path: /etc/kubernetes/pki/etcd + ``` + + +有关此错误的更多信息,请查阅[此问题的跟踪页面](https://github.com/kubernetes/kubeadm/issues/2927)。