Merge pull request #45130 from tzifudzi/ops-readiness-blog

blog: Windows Operational Readiness
pull/45722/head^2
Kubernetes Prow Robot 2024-03-28 08:29:56 -07:00 committed by GitHub
commit 68cb308aa2
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 156 additions and 0 deletions

View File

@ -0,0 +1,156 @@
---
layout: blog
title: "Introducing the Windows Operational Readiness Specification"
date: 2024-04-03
slug: intro-windows-ops-readiness
---
**Authors:** Jay Vyas (Tesla), Amim Knabben (Broadcom), and Tatenda Zifudzi (AWS)
Since Windows support [graduated to stable](/blog/2019/03/25/kubernetes-1-14-release-announcement/)
with Kubernetes 1.14 in 2019, the capability to run Windows workloads has been much
appreciated by the end user community. The level of and availability of Windows workload
support has consistently been a major differentiator for Kubernetes distributions used by
large enterprises. However, with more Windows workloads being migrated to Kubernetes
and new Windows features being continuously released, it became challenging to test
Windows worker nodes in an effective and standardized way.
The Kubernetes project values the ability to certify conformance without requiring a
closed-source license for a certified distribution or service that has no intention
of offering Windows.
Some notable examples brought to the attention of SIG Windows were:
- An issue with load balancer source address ranges functionality not operating correctly on
Windows nodes, detailed in a GitHub issue:
[kubernetes/kubernetes#120033](https://github.com/kubernetes/kubernetes/issues/120033).
- Reports of functionality issues with Windows features, such as
“[GMSA](https://learn.microsoft.com/en-us/windows-server/security/group-managed-service-accounts/group-managed-service-accounts-overview) not working with containerd,
discussed in [microsoft/Windows-Containers#44](https://github.com/microsoft/Windows-Containers/issues/44).
- Challenges developing networking policy tests that could objectively evaluate
Container Network Interface (CNI) plugins across different operating system configurations,
as discussed in [kubernetes/kubernetes#97751](https://github.com/kubernetes/kubernetes/issues/97751).
SIG Windows therefore recognized the need for a tailored solution to ensure Windows
nodes' operational readiness *before* their deployment into production environments.
Thus, the idea to develop a [Windows Operational Readiness Specification](https://kep.k8s.io/2578)
was born.
## Cant we just run the official Conformance tests?
The Kubernetes project contains a set of [conformance tests](https://www.cncf.io/training/certification/software-conformance/#how),
which are standardized tests designed to ensure that a Kubernetes cluster meets
the required Kubernetes specifications.
However, these tests were originally defined at a time when Linux was the *only*
operating system compatible with Kubernetes, and thus, they were not easily
extendable for use with Windows. Given that Windows workloads, despite their
importance, account for a smaller portion of the Kubernetes community, it was
important to ensure that the primary conformance suite relied upon by many
Kubernetes distributions to certify Linux conformance, didn't become encumbered
with Windows specific features or enhancements such as GMSA or multi-operating
system kube-proxy behavior.
Therefore, since there was a specialized need for Windows conformance testing,
SIG Windows went down the path of offering Windows specific conformance tests
through the Windows Operational Readiness Specification.
## Cant we just run the Kubernetes end-to-end test suite?
In the Linux world, tools such as [Sonobuoy](https://sonobuoy.io/) simplify execution of the
conformance suite, relieving users from needing to be aware of Kubernetes'
compilation paths or the semantics of [Ginkgo](https://onsi.github.io/ginkgo) tags.
Regarding needing to compile the Kubernetes tests, we realized that Windows
users might similarly find the process of compiling and running the Kubernetes
e2e suite from scratch similarly undesirable, hence, there was a clear need to
provide a user-friendly, "push-button" solution that is ready to go. Moreover,
regarding Ginkgo tags, applying conformance tests to Windows nodes through a set
of [Ginkgo](https://onsi.github.io/ginkgo/) tags would also be burdensome for
any user, including Linux enthusiasts or experienced Windows system admins alike.
To bridge the gap and give users a straightforward way to confirm their clusters
support a variety of features, the Kubernetes SIG for Windows found it necessary to
therefore create the Windows Operational Readiness application. This application
written in Go, simplifies the process to run the necessary Windows specific tests
while delivering results in a clear, accessible format.
This initiative has been a collaborative effort, with contributions from different
cloud providers and platforms, including Amazon, Microsoft, SUSE, and Broadcom.
## A closer look at the Windows Operational Readiness Specification {#specification}
The Windows Operational Readiness specification specifically targets and executes
tests found within the Kubernetes repository in a more user-friendly way than
simply targeting [Ginkgo](https://onsi.github.io/ginkgo/) tags. It introduces a
structured test suite that is split into sets of core and extended tests, with
each set of tests containing categories directed at testing a specific area of
testing, such as networking. Core tests target fundamental and critical
functionalities that Windows nodes should support as defined by the Kubernetes
specification. On the other hand, extended tests cover more complex features,
more aligned with diving deeper into Windows-specific capabilities such as
integrations with Active Directory. These goal of these tests is to be extensive,
covering a wide array of Windows-specific capabilities to ensure compatibility
with a diverse set of workloads and configurations, extending beyond basic
requirements. Below is the current list of categories.
| Category Name | Category Description |
|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
| `Core.Network` | Tests minimal networking functionality (ability to access pod-by-pod IP.) |
| `Core.Storage` | Tests minimal storage functionality, (ability to mount a hostPath storage volume.) |
| `Core.Scheduling` | Tests minimal scheduling functionality, (ability to schedule a pod with CPU limits.) |
| `Core.Concurrent` | Tests minimal concurrent functionality, (the ability of a node to handle traffic to multiple pods concurrently.) |
| `Extend.HostProcess` | Tests features related to Windows HostProcess pod functionality. |
| `Extend.ActiveDirectory` | Tests features related to Active Directory functionality. |
| `Extend.NetworkPolicy` | Tests features related to Network Policy functionality. |
| `Extend.Network` | Tests advanced networking functionality, (ability to support IPv6) |
| `Extend.Worker` | Tests features related to Windows worker node functionality, (ability for nodes to access TCP and UDP services in the same cluster) |
## How to conduct operational readiness tests for Windows nodes
To run the Windows Operational Readiness test suite, refer to the test suite's
[`README`](https://github.com/kubernetes-sigs/windows-operational-readiness/blob/main/README.md), which explains how to set it up and run it. The test suite offers
flexibility in how you can execute tests, either using a compiled binary or a
Sonobuoy plugin. You also have the choice to run the tests against the entire
test suite or by specifying a list of categories. Cloud providers have the
choice of uploading their conformance results, enhancing transparency and reliability.
Once you have checked out that code, you can run a test. For example, this sample
command runs the tests from the `Core.Concurrent` category:
```shell
./op-readiness --kubeconfig $KUBE_CONFIG --category Core.Concurrent
```
As a contributor to Kubernetes, if you want to test your changes against a specific pull
request using the Windows Operational Readiness Specification, use the following bot
command in the new pull request.
```shell
/test operational-tests-capz-windows-2019
```
## Looking ahead
Were looking to improve our curated list of Windows-specific tests by adding
new tests to the Kubernetes repository and also identifying existing test cases
that can be targetted. The long term goal for the specification is to continually
enhance test coverage for Windows worker nodes and improve the robustness of
Windows support, facilitating a seamless experience across diverse cloud
environments. We also have plans to integrate the Windows Operational Readiness
tests into the official Kubernetes conformance suite.
If you are interested in helping us out, please reach out to us! We welcome help
in any form, from giving once-off feedback to making a code contribution,
to having long-term owners to help us drive changes. The Windows Operational
Readiness specification is owned by the SIG Windows team. You can reach out
to the team on the [Kubernetes Slack workspace](https://slack.k8s.io/) **#sig-windows**
channel. You can also explore the [Windows Operational Readiness test suite](https://github.com/kubernetes-sigs/windows-operational-readiness/#readme)
and make contributions directly to the GitHub repository.
Special thanks to Kulwant Singh (AWS), Pramita Gautam Rana (VMWare), Xinqi Li
(Google) for their help in making notable contributions to the specification. Additionally,
appreciation goes to James Sturtevant (Microsoft), Mark Rossetti (Microsoft),
Claudiu Belu (Cloudbase Solutions) and Aravindh Puthiyaparambil
(Softdrive Technologies Group Inc.) from the SIG Windows team for their guidance and support.