Commit Graph

145 Commits (master)

Author SHA1 Message Date
Brad Davidson f3a036a9b1
Bump kine for compact_rev_key watch fix
Fix apiserver-managed compact, and enable it

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-09-05 00:04:41 +00:00
Derek Nola 56ef1cd3a2
Update etcd to v3.6.4-k3s3
* Raft is now an independent dependency, with a seperate release version
* errors moved into their own subpackage
* set a default WarningUnaryRequestDuration

Signed-off-by: Derek Nola <derek.nola@suse.com>
Co-authored-by: Michael Fritch <mfritch@suse.com>
2025-09-04 17:33:10 -06:00
Brad Davidson f1c82392d0 Fix etcd join timeout handling
Error is deadline exceeded, not cancelled

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-08-27 13:41:54 -07:00
Brad Davidson a9016f3dcb Add retry on etcd MemberAdd timeout
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-08-26 09:35:48 -07:00
Brad Davidson 5ce3db779d Update kine and use config defaults helper
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-07-11 10:10:13 -07:00
Brad Davidson 5cc51edafa Fix sqlite-etcd migration
Forgot to add new config to temporary kine in #12293

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-06-12 17:17:49 -07:00
Brad Davidson f90334e207 Fix etcd socket option config
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-04-11 13:39:44 -07:00
Brad Davidson 9deef77eef Add ReusePort/ReuseAddr flags to etcd config
Addresses flakes in etcd CI due to the port still being in TIME_WAIT after the server is shut down between tests

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-04-08 15:27:19 -07:00
Brad Davidson d45006be66 Move etcd ready channel into executor
This eliminates the final channel that was being passed around in an internal struct. The ETCD management code passes in a func that can be polled until etcd is ready; the executor is responsible for polling this after etcd is started and closing the etcd ready channel at the correct time.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-24 12:42:29 -07:00
Brad Davidson a8bc412422 Move container runtime ready channel into executor
Move the container runtime ready channel into the executor interface, instead of passing it awkwardly between server and agent config structs

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-24 12:42:29 -07:00
Brad Davidson bed1f66880 Avoid use of github.com/pkg/errors functions that capture stack
We are not making use of the stack traces that these functions capture, so we should avoid using them as unnecessary overhead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-03-05 00:41:38 -08:00
Brad Davidson f940368747 Use etcd proxy to bootstrap control-plane-only nodes, if possible
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-02-27 11:19:26 -08:00
Brad Davidson 365372441b Handle cluster join as create if we're the only member
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-01-09 00:51:19 -08:00
Brad Davidson 6381ae93e7 Switch to using kubelet config files instead of CLI args
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-20 14:41:40 -08:00
Brad Davidson 095e34d816 Fix issues with defragment and alarm clear on etcd startup
* Use clientv3.NewCtxClient instead of New to avoid automatic retry of all RPCs
* Only timeout status requests; allow defrag and alarm clear requests to run to completion.
* Only clear alarms on the local cluster member, not ALL cluster members

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-10-30 12:18:48 -07:00
Brad Davidson 0942e6a0c5 Fix sqlite endpoint when migrating from sqlite to etcd
Support for 'sqlite' as the endpoint was removed in
https://github.com/k3s-io/kine/pull/320 and the constant removed in
https://github.com/k3s-io/kine/pull/325

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-10-03 10:54:03 -07:00
Will e4f3cc7b54 remove deprecated use of wait functions
Signed-off-by: Will <will7989@hotmail.com>
2024-07-29 16:23:17 -07:00
Brad Davidson c36db53e54 Add etcd s3 config secret implementation
* Move snapshot structs and functions into pkg/etcd/snapshot
* Move s3 client code and functions into pkg/etcd/s3
* Refactor pkg/etcd to track snapshot and s3 moves
* Add support for reading s3 client config from secret
* Add minio client cache, since S3 client configuration can now be
  changed at runtime by modifying the secret, and don't want to have to
  create a new minio client every time we read config.
* Add tests for pkg/etcd/s3

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-10 13:13:55 -07:00
Brad Davidson aa4794b372 Replace 1-weight semaphore on snapshots with simple mutex
Fixes an issue where the semaphore wasn't permanently initialized
until a scheduled snapshot was taken, allowing multiple on-demand
snapshots to be taken until the first scheduled snapshot was triggered.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-19 09:47:58 -07:00
Brad Davidson f8e0648304 Convert remaining http handlers over to use util.SendError
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 16:24:57 -07:00
Brad Davidson 3d14092f76 Fix issue with k3s-etcd informers not starting
Start shared informer caches when k3s-etcd controller wins leader election. Previously, these were only started when the main k3s apiserver controller won an election. If the leaders ended up going to different nodes, some informers wouldn't be started

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 15:48:15 -07:00
Hussein Galal 144f5ad333
Kubernetes V1.30.0-k3s1 (#10063)
* kubernetes 1.30.0-k3s1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Update go version to v1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener and helm-controller

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in go.mod to 1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in Dockerfiles

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update cri-dockerd

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add proctitle package with linux and windows constraints

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fixing setproctitle function

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener to v0.6.0-rc1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

---------

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2024-05-06 19:42:27 +03:00
Brad Davidson 94e29e2ef5 Make /db/info available anonymously from localhost
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-22 19:34:43 -07:00
Brad Davidson 7d9abc9f07 Improve etcd load-balancer startup behavior
Prefer the address of the etcd member being joined, and seed the full address list immediately on startup.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:36:33 -07:00
Brad Davidson fe465cc832 Move etcd snapshot management CLI to request/response
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:21:26 -07:00
Vitor Savian 5d69d6e782 Add tls for kine
Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Bump kine

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Add integration tests for kine with tls

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-03-28 11:12:07 -03:00
Brad Davidson d7cdbb7d4d Send error response if member list cannot be retrieved
Prevents joining nodes from being stuck with bad initial member list if there is a transient failure, or if they try to join themselves

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-26 15:17:15 -07:00
Brad Davidson 82432a2df7 Fix issue with etcd node name missing hostname
* Set ServerNodeName in snapshot CLI setup
* Raise errer if ServerNodeName ends up empty some other way
* Fix status controller to use etcd node name annotation instead of prefix checking

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-01 13:52:53 -08:00
Derek Nola fae41a8b2a Rename AgentReady to ContainerRuntimeReady for better clarity
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-02-21 12:21:19 -08:00
Brad Davidson de825845b2 Bump kine and set NotifyInterval to what the apiserver expects
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-02-09 14:22:38 -08:00
Vitor Savian f9ee66f4d8 Changed how lastHeartBeatTime works in the etcd condition
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-02-07 15:05:33 -03:00
Brad Davidson 8224a3a7f6 Fix ipv6 endpoint address selection for on-demand snapshots
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-02-06 18:02:36 -08:00
Vitor Savian 9a70021a9e Error getting node in setEtcdStatusCondition
Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Added retry and changed nodes for

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-01-11 22:06:36 -03:00
Vitor Savian 4a92ced8ee Handle etcd status condition when cluster reset and disable etcd
Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Set condition if node is unhealthy

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-01-09 11:20:41 -03:00
Vitor Savian e53c189587
Handle nil pointer when runtime core is not ready in etcd
Signed-off-by: Vitor <vitor.savian@suse.com>
2023-11-16 15:58:42 -08:00
Vitor Savian c5cd7b3d65
Added etcd status condition
Signed-off-by: Vitor <vitor.savian@suse.com>
2023-11-13 06:39:24 -08:00
Brad Davidson 49411e7084 Don't try to read token hash and cluster id during cluster-reset
These fields are only necessary when saving snapshots to S3, and will block restoration if attempted

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-27 15:06:29 -07:00
Brad Davidson 3db1d33282 Re-enable etcd endpoint auto-sync
Removing this in 002e6c43ee regressed
control-plane-only nodes, as we rely on the etcd client to update its
endpoint list internally so that we can use it to sync the load-balancer
address list.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-18 08:33:03 -07:00
Brad Davidson 9597ea1183 Start etcd client before ensuring self removal
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-13 23:24:16 -07:00
Brad Davidson 550ab36ab7 Switch to managing ETCDSnapshotFile resources
Reconcile snapshot CRs instead of ConfigMap; manage ConfigMap downstream from CR list

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson 676b00aa0e Move etcd snapshot code into separate file
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson 002e6c43ee Reorganize Driver interface and etcd driver to avoid passing context and config into most calls
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-25 11:54:23 -07:00
Brad Davidson 890645924f Don't export functions not needed outside the etcd package
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-25 11:54:23 -07:00
Brad Davidson 8c73fd670b Disable HTTP on main etcd client port
Fixes performance issue under load, ref: https://github.com/etcd-io/etcd/issues/15402 and https://github.com/kubernetes/kubernetes/pull/118460

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-25 08:29:57 -07:00
Vitor Savian e83b1ba4aa
Fixed the etcd retention to delete orphaned snapshots based on the date (#8177)
* Fix retention using name instead of date

Signed-off-by: Vitor <vitor.savian@suse.com>
2023-08-14 18:48:59 -03:00
Ian Cardoso e551308db8
fix for etcd-snapshot delete with --etcd-s3 flag (#8110)
k3s etcd-snapshot save --etcd-s3 ... is creating a local snapshot and uploading it to s3 while k3s etcd-snapshot delete --etcd-s3 ... was deleting the snapshot only on s3 buckets, this commit change the behavior of delete to do it locally and on s3

Signed-off-by: Ian Cardoso <osodracnai@gmail.com>
2023-08-04 14:26:32 -03:00
Vitor Savian ca7aeed090
Etcd snapshots retention when node name changes (#8099)
Fixed the etcd retention to delete orphaned snapshots

Signed-off-by: Vitor <vitor.savian@suse.com>
2023-08-03 10:54:40 -03:00
Brad Davidson aa76942d0f Add FilterCN function to prevent SAN Stuffing
Wire up a node watch to collect addresses of server nodes, to prevent adding unauthorized SANs to the dynamiclistener cert.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-08-02 11:15:39 -07:00
Brad Davidson e61fde93c1 Fix MemberList error handling and incorrect etcd-arg passthrough
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-28 22:04:30 -07:00
Brad Davidson 91afb38799 Retry cluster join on "too many learners" error
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-28 11:28:33 -07:00