Render cri registry mirrors.x.endpoints and configs.x.tls into config_path; keep
using mirrors.x.rewrites and configs.x.auth those do not yet have an
equivalent in the new format.
The new config file format allows disabling containerd's fallback to the
default endpoint when using mirror endpoints; a new CLI flag is added to
control that behavior.
This also re-shares some code that was unnecessarily split into parallel
implementations for linux/windows versions. There is probably more work
to be done on this front but it's a good start.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Remove KubeletCredentialProviders and JobTrackingWithFinalizers feature-gates, both of which are GA and cannot be disabled.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Problem:
Configuring qos-class features in containerd requres a custom containerd configuration template.
Solution:
Look for configuration files in default locations and configure containerd to use them if they exist.
Signed-off-by: Oliver Larsson <larsson.e.oliver@gmail.com>
Enable the feature-gate for both kubelet and cloud-controller-manager. Enabling it on only one side breaks RKE2, where feature-gates are not shared due to running in different processes.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* initial windows port.
Signed-off-by: Sean Yen <seanyen@microsoft.com>
Signed-off-by: Derek Nola <derek.nola@suse.com>
Co-authored-by: Derek Nola <derek.nola@suse.com>
Co-authored-by: Wei Ran <weiran@microsoft.com>
Write the extra metadata both locally and to S3. These files are placed such that they will not be used by older versions of K3s that do not make use of them.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Consolidate NewCertCommands
* Add support for user defined new token
* Add E2E testlets
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Ensure agent token also changes
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Add --image-service-endpoint flag
Problem:
External container runtime can be set but image service endpoint is unchanged
and also is not exposed as a flag. This is useful for using containerd
snapshotters outside of the ones that have built-in support like
stargz-snapshotter.
Solution:
Add a flag --image-service-endpoint and also default image service endpoint to
container runtime endpoint if set.
Signed-off-by: Edgar Lee <edgarhinshunlee@gmail.com>
* Update to v1.28.2
Signed-off-by: Johnatas <johnatasr@hotmail.com>
* Bump containerd and stargz versions
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Print message on upgrade fail
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Send Bad Gateway instead of Service Unavailable when tunnel dial fails
Works around new handling for Service Unavailable by apiserver aggregation added in kubernetes/kubernetes#119870
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Add 60 seconds to server upgrade wait to account for delays in apiserver readiness
Also change cleanup helper to ensure upgrade test doesn't pollute the
images for the rest of the tests.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
---------
Signed-off-by: Johnatas <johnatasr@hotmail.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Co-authored-by: Brad Davidson <brad.davidson@rancher.com>
Only configure enable-aggregator-routing and egress-selector-config-file
if required by egress-selector-mode.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Problem:
When support for etcd was added in 3957142, generation of certificates and keys for etcd was not gated behind use of managed etcd.
Keys are generated and distributed across servers even if managed etcd is not enabled.
Solution:
Allow generation of certificates and keys only if managed etc is enabled. Check config.DisableETCD flag.
Signed-off-by: Bartossh <lenartconsulting@gmail.com>
Allows nodes to join the cluster during a webhook outage. This also
enhances auditability by creating Kubernetes events for the deferred
verification.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Only actual admin actions should use the admin kubeconfig; everything done by the supervisor/deploy/helm controllers will now use a distinct account for audit purposes.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Add helper function for multiple arguments in stringslice
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Cleanup server setup with util function
Signed-off-by: Derek Nola <derek.nola@suse.com>
Several places in the code used a 5-second retry loop to wait on
Runtime.Core to be set. This caused a race condition where OnChange
handlers could be added after the Wrangler shared informers were already
started. When this happened, the handlers were never called because the
shared informers they relied upon were not started.
Fix that by requiring anything that waits on Runtime.Core to run from a
cluster controller startup hook that is guaranteed to be called before
the shared informers are started, instead of just firing it off in a
goroutine that retries until it is set.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
We need to send the full chain in order for cross-signing to work
properly during switchover to a new root.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Turns out etcd-only nodes were never running **any** of the controllers,
so allowing multiple controllers didn't really fix things.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Prevents errors when starting with fail-closed webhooks
Also, use panic instead of Fatalf so that the CloudControllerManager rescue can handle the error
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Allow bootstrapping with kubeadm bootstrap token strings or existing
Kubelet certs. This allows agents to join the cluster using kubeadm
bootstrap tokens, as created with the `k3s token create` command.
When the token expires or is deleted, agents can successfully restart by
authenticating with their kubelet certificate via node authentication.
If the token is gone and the node is deleted from the cluster, node auth
will fail and they will be prevented from rejoining the cluster until
provided with a valid token.
Servers still must be bootstrapped with the static cluster token, as
they will need to know it to decrypt the bootstrap data.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* General cleanup of test-helpers functions to address CI failures
* Install awscli in test image
* Log containerd output to file even when running with --debug
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Add EncryptSecrets to Critical Control Args
* use deep comparison to extract differences
Signed-off-by: Derek Nola <derek.nola@suse.com>
Signed-off-by: Derek Nola <derek.nola@suse.com>
Using the node external IP address for all CNI traffic is a breaking change from previous versions; we should make it an opt-in for distributed clusters instead of default behavior.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
We should be reading from the hijacked bufio.ReaderWriter instead of
directly from the net.Conn. There is a race condition where the
underlying http handler may consume bytes from the hijacked request
stream, if it comes in the same packet as the CONNECT header. These
bytes are left in the buffered reader, which we were not using. This was
causing us to occasionally drop a few bytes from the start of the
tunneled connection's client data stream.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
If CCM and ServiceLB are both disabled, don't run the cloud-controller-manager at all;
this should provide the same CLI flag behavior as previous releases, and not create
problems when users disable the CCM but still want ServiceLB.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Requires tweaking existing method signature to allow specifying whether or not IPv6 addresses should be return URL-safe.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
cadvisor still doesn't pull stats via CRI yet, so we have to continue to use the deprecated arg.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Use same kubelet-preferred-address-types setting as RKE2 to improve reliability of the egress selector when using a HTTP proxy. Also, use BindAddressOrLoopback to ensure that the correct supervisor address is used when --bind-address is set.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Move startup hooks wg into a runtime pointer, check before notifying systemd
* Switch default systemd notification to server
* Add 1 sec delay to allow etcd to write to disk
Signed-off-by: Derek Nola <derek.nola@suse.com>
The control-plane context handles requests outside the cluster and
should not be sent to the proxy.
In agent mode, we don't watch pods and just direct-dial any request for
a non-node address, which is the original behavior.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Watching pods appears to be the most reliable way to ensure that the
proxy routes and authorizes connections.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This reverts commit aa9065749c.
Setting dual-stack node-ip does not work when --cloud-provider is set
to anything, including 'external'. Just set node-ip to the first IP, and
let the cloud provider add the other address.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
The cloud-provider arg is deprecated and cannot be set to anything other than external, but must still be used or node addresses are not set properly.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Reduces code complexity a bit and ensures we don't have to handle closed watch channels on our own
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Before this change, kube-router was always assuming that IPv4 is
enabled, which is not the case in IPv6-only clusters. To enable network
policies in IPv6-only, we need to explicitly let kube-router know when
to disable IPv4.
Signed-off-by: Michal Rostecki <vadorovsky@gmail.com>
* Bump etcd to v3.5.4-k3s1
* Fix issue with datastore corruption on cluster-reset
* Disable unnecessary components during cluster reset
Disable control-plane components and the tunnel setup during
cluster-reset, even when not doing a restore. This reduces the amount of
log clutter during cluster reset/restore, making any errors encountered
more obvious.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Problem:
Specifying extra arguments for the API server for example is not supported as
the arguments get stored in a map before being passed to the API server.
Solution:
Updated the GetArgs function to store the arguments in a map that can have
multiple values. Some more logic is added so that repeated extra arguments
retain their order when sorted whilst overall the arguments can still be
sorted for improved readability when logged.
Support has been added for prefixing and suffixing default argument values
by using -= and += when specifying extra arguments.
Signed-off-by: Terry Cain <terry@terrys-home.co.uk>