Merge pull request #43095 from pacoxu/kubeadm-etcd-learner-mode
add a new blog for etcd learner mode usage in kubeadm joinpull/43186/head
commit
8662094c8e
|
@ -0,0 +1,107 @@
|
|||
---
|
||||
layout: blog
|
||||
title: 'kubeadm: Use etcd Learner to Join a Control Plane Node Safely'
|
||||
date: 2023-09-25
|
||||
slug: kubeadm-use-etcd-learner-mode
|
||||
---
|
||||
|
||||
**Author:** Paco Xu (DaoCloud)
|
||||
|
||||
The [`kubeadm`](/docs/reference/setup-tools/kubeadm/) tool now supports etcd learner mode, which
|
||||
allows you to enhance the resilience and stability
|
||||
of your Kubernetes clusters by leveraging the [learner mode](https://etcd.io/docs/v3.4/learning/design-learner/#appendix-learner-implementation-in-v34)
|
||||
feature introduced in etcd version 3.4.
|
||||
This guide will walk you through using etcd learner mode with kubeadm. By default, kubeadm runs
|
||||
a local etcd instance on each control plane node.
|
||||
|
||||
In v1.27, kubeadm introduced a new feature gate `EtcdLearnerMode`. With this feature gate enabled,
|
||||
when joining a new control plane node, a new etcd member will be created as a learner and
|
||||
promoted to a voting member only after the etcd data are fully aligned.
|
||||
|
||||
## What are the advantages of using etcd learner mode?
|
||||
|
||||
etcd learner mode offers several compelling reasons to consider its adoption
|
||||
in Kubernetes clusters:
|
||||
|
||||
1. **Enhanced Resilience**: etcd learner nodes are non-voting members that catch up with
|
||||
the leader's logs before becoming fully operational. This prevents new cluster members
|
||||
from disrupting the quorum or causing leader elections, making the cluster more resilient
|
||||
during membership changes.
|
||||
2. **Reduced Cluster Unavailability**: Traditional approaches to adding new members often
|
||||
result in cluster unavailability periods, especially in slow infrastructure or misconfigurations.
|
||||
etcd learner mode minimizes such disruptions.
|
||||
3. **Simplified Maintenance**: Learner nodes provide a safer and reversible way to add or replace
|
||||
cluster members. This reduces the risk of accidental cluster outages due to misconfigurations or
|
||||
missteps during member additions.
|
||||
4. **Improved Network Tolerance**: In scenarios involving network partitions, learner mode allows
|
||||
for more graceful handling. Depending on the partition a new member lands, it can seamlessly
|
||||
integrate with the existing cluster without causing disruptions.
|
||||
|
||||
In summary, the etcd learner mode improves the reliability and manageability of Kubernetes clusters
|
||||
during member additions and changes, making it a valuable feature for cluster operators.
|
||||
|
||||
## How nodes join a cluster that's using the new mode
|
||||
|
||||
### Create a Kubernetes cluster backed by etcd in learner mode {#create-K8s-cluster-etcd-learner-mode}
|
||||
|
||||
For a general explanation about creating highly available clusters with kubeadm, you can refer to
|
||||
[Creating Highly Available Clusters with kubeadm](/docs/setup/production-environment/tools/kubeadm/high-availability/).
|
||||
|
||||
To create a Kubernetes cluster, backed by etcd in learner mode, using kubeadm, follow these steps:
|
||||
|
||||
```shell
|
||||
# kubeadm init --feature-gates=EtcdLearnerMode=true ...
|
||||
kubeadm init --config=kubeadm-config.yaml
|
||||
```
|
||||
|
||||
The kubeadm configuration file is like below:
|
||||
|
||||
```yaml
|
||||
apiVersion: kubeadm.k8s.io/v1beta3
|
||||
kind: ClusterConfiguration
|
||||
featureGates:
|
||||
EtcdLearnerMode: true
|
||||
```
|
||||
|
||||
The kubeadm tool deploys a single-node Kubernetes cluster with etcd set to use learner mode.
|
||||
|
||||
### Join nodes to the Kubernetes cluster
|
||||
|
||||
Before joining a control-plane node to the new Kubernetes cluster, ensure that the existing control plane nodes
|
||||
and all etcd members are healthy.
|
||||
|
||||
Check the cluster health with `etcdctl`. If `etcdctl` isn't available, you can run this tool inside a container image.
|
||||
You would do that directly with your container runtime using a tool such as `crictl run` and not through Kubernetes
|
||||
|
||||
Here is an example on a client command that uses secure communication to check the cluster health of the etcd cluster:
|
||||
|
||||
```shell
|
||||
ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 \
|
||||
--cert=/etc/kubernetes/pki/etcd/server.crt \
|
||||
--key=/etc/kubernetes/pki/etcd/server.key \
|
||||
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
|
||||
member list
|
||||
...
|
||||
dc543c4d307fadb9, started, node1, https://10.6.177.40:2380, https://10.6.177.40:2379, false
|
||||
```
|
||||
|
||||
To check if the Kubernetes control plane is healthy, run `kubectl get node -l node-role.kubernetes.io/control-plane=`
|
||||
and check if the nodes are ready.
|
||||
|
||||
Note: It is recommended to have an odd number of members in a etcd cluster.
|
||||
|
||||
Before joining a worker node to the new Kubernetes cluster, ensure that the control plane nodes are healthy.
|
||||
|
||||
## What's next
|
||||
|
||||
- The feature gate `EtcdLearnerMode` is alpha in v1.27 and we expect it to graduate to beta in the next
|
||||
minor release of Kubernetes (v1.29).
|
||||
- etcd has an open issue that may make the process more automatic:
|
||||
[Support auto-promoting a learner member to a voting member](https://github.com/etcd-io/etcd/issues/15107).
|
||||
- Learn more about the kubeadm [configuration format](/docs/reference/config-api/kubeadm-config.v1beta3/) here.
|
||||
|
||||
## Feedback
|
||||
|
||||
Was this guide helpful? If you have any feedback or encounter any issues, please let us know.
|
||||
Your feedback is always welcome! Join the bi-weekly [SIG Cluster Lifecycle meeting](https://docs.google.com/document/d/1Gmc7LyCIL_148a9Tft7pdhdee0NBHdOfHS1SAF0duI4/edit)
|
||||
or weekly [kubeadm office hours](https://docs.google.com/document/d/130_kiXjG7graFNSnIAgtMS1G8zPDwpkshgfRYS0nggo/edit). Or reach us via [Slack](https://slack.k8s.io/) (channel **#kubeadm**), or the [SIG's mailing list](https://groups.google.com/g/kubernetes-sig-cluster-lifecycle).
|
Loading…
Reference in New Issue