diff --git a/content/en/blog/_posts/2024-04-03-windows-ops-readiness.md b/content/en/blog/_posts/2024-04-03-windows-ops-readiness.md new file mode 100644 index 0000000000..500cc07746 --- /dev/null +++ b/content/en/blog/_posts/2024-04-03-windows-ops-readiness.md @@ -0,0 +1,156 @@ +--- +layout: blog +title: "Introducing the Windows Operational Readiness Specification" +date: 2024-04-03 +slug: intro-windows-ops-readiness +--- + +**Authors:** Jay Vyas (Tesla), Amim Knabben (Broadcom), and Tatenda Zifudzi (AWS) + + +Since Windows support [graduated to stable](/blog/2019/03/25/kubernetes-1-14-release-announcement/) +with Kubernetes 1.14 in 2019, the capability to run Windows workloads has been much +appreciated by the end user community. The level of and availability of Windows workload +support has consistently been a major differentiator for Kubernetes distributions used by +large enterprises. However, with more Windows workloads being migrated to Kubernetes +and new Windows features being continuously released, it became challenging to test +Windows worker nodes in an effective and standardized way. + +The Kubernetes project values the ability to certify conformance without requiring a +closed-source license for a certified distribution or service that has no intention +of offering Windows. + +Some notable examples brought to the attention of SIG Windows were: + +- An issue with load balancer source address ranges functionality not operating correctly on + Windows nodes, detailed in a GitHub issue: + [kubernetes/kubernetes#120033](https://github.com/kubernetes/kubernetes/issues/120033). +- Reports of functionality issues with Windows features, such as + “[GMSA](https://learn.microsoft.com/en-us/windows-server/security/group-managed-service-accounts/group-managed-service-accounts-overview) not working with containerd, + discussed in [microsoft/Windows-Containers#44](https://github.com/microsoft/Windows-Containers/issues/44). +- Challenges developing networking policy tests that could objectively evaluate + Container Network Interface (CNI) plugins across different operating system configurations, + as discussed in [kubernetes/kubernetes#97751](https://github.com/kubernetes/kubernetes/issues/97751). + +SIG Windows therefore recognized the need for a tailored solution to ensure Windows +nodes' operational readiness *before* their deployment into production environments. +Thus, the idea to develop a [Windows Operational Readiness Specification](https://kep.k8s.io/2578) +was born. + +## Can’t we just run the official Conformance tests? + +The Kubernetes project contains a set of [conformance tests](https://www.cncf.io/training/certification/software-conformance/#how), +which are standardized tests designed to ensure that a Kubernetes cluster meets +the required Kubernetes specifications. + +However, these tests were originally defined at a time when Linux was the *only* +operating system compatible with Kubernetes, and thus, they were not easily +extendable for use with Windows. Given that Windows workloads, despite their +importance, account for a smaller portion of the Kubernetes community, it was +important to ensure that the primary conformance suite relied upon by many +Kubernetes distributions to certify Linux conformance, didn't become encumbered +with Windows specific features or enhancements such as GMSA or multi-operating +system kube-proxy behavior. + +Therefore, since there was a specialized need for Windows conformance testing, +SIG Windows went down the path of offering Windows specific conformance tests +through the Windows Operational Readiness Specification. + +## Can’t we just run the Kubernetes end-to-end test suite? + +In the Linux world, tools such as [Sonobuoy](https://sonobuoy.io/) simplify execution of the +conformance suite, relieving users from needing to be aware of Kubernetes' +compilation paths or the semantics of [Ginkgo](https://onsi.github.io/ginkgo) tags. + +Regarding needing to compile the Kubernetes tests, we realized that Windows +users might similarly find the process of compiling and running the Kubernetes +e2e suite from scratch similarly undesirable, hence, there was a clear need to +provide a user-friendly, "push-button" solution that is ready to go. Moreover, +regarding Ginkgo tags, applying conformance tests to Windows nodes through a set +of [Ginkgo](https://onsi.github.io/ginkgo/) tags would also be burdensome for +any user, including Linux enthusiasts or experienced Windows system admins alike. + +To bridge the gap and give users a straightforward way to confirm their clusters +support a variety of features, the Kubernetes SIG for Windows found it necessary to +therefore create the Windows Operational Readiness application. This application +written in Go, simplifies the process to run the necessary Windows specific tests +while delivering results in a clear, accessible format. + +This initiative has been a collaborative effort, with contributions from different +cloud providers and platforms, including Amazon, Microsoft, SUSE, and Broadcom. + +## A closer look at the Windows Operational Readiness Specification {#specification} + +The Windows Operational Readiness specification specifically targets and executes +tests found within the Kubernetes repository in a more user-friendly way than +simply targeting [Ginkgo](https://onsi.github.io/ginkgo/) tags. It introduces a +structured test suite that is split into sets of core and extended tests, with +each set of tests containing categories directed at testing a specific area of +testing, such as networking. Core tests target fundamental and critical +functionalities that Windows nodes should support as defined by the Kubernetes +specification. On the other hand, extended tests cover more complex features, +more aligned with diving deeper into Windows-specific capabilities such as +integrations with Active Directory. These goal of these tests is to be extensive, +covering a wide array of Windows-specific capabilities to ensure compatibility +with a diverse set of workloads and configurations, extending beyond basic +requirements. Below is the current list of categories. + +| Category Name | Category Description | +|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------| +| `Core.Network` | Tests minimal networking functionality (ability to access pod-by-pod IP.) | +| `Core.Storage` | Tests minimal storage functionality, (ability to mount a hostPath storage volume.) | +| `Core.Scheduling` | Tests minimal scheduling functionality, (ability to schedule a pod with CPU limits.) | +| `Core.Concurrent` | Tests minimal concurrent functionality, (the ability of a node to handle traffic to multiple pods concurrently.) | +| `Extend.HostProcess` | Tests features related to Windows HostProcess pod functionality. | +| `Extend.ActiveDirectory` | Tests features related to Active Directory functionality. | +| `Extend.NetworkPolicy` | Tests features related to Network Policy functionality. | +| `Extend.Network` | Tests advanced networking functionality, (ability to support IPv6) | +| `Extend.Worker` | Tests features related to Windows worker node functionality, (ability for nodes to access TCP and UDP services in the same cluster) | + +## How to conduct operational readiness tests for Windows nodes + +To run the Windows Operational Readiness test suite, refer to the test suite's +[`README`](https://github.com/kubernetes-sigs/windows-operational-readiness/blob/main/README.md), which explains how to set it up and run it. The test suite offers +flexibility in how you can execute tests, either using a compiled binary or a +Sonobuoy plugin. You also have the choice to run the tests against the entire +test suite or by specifying a list of categories. Cloud providers have the +choice of uploading their conformance results, enhancing transparency and reliability. + +Once you have checked out that code, you can run a test. For example, this sample +command runs the tests from the `Core.Concurrent` category: + +```shell +./op-readiness --kubeconfig $KUBE_CONFIG --category Core.Concurrent +``` + +As a contributor to Kubernetes, if you want to test your changes against a specific pull +request using the Windows Operational Readiness Specification, use the following bot +command in the new pull request. + +```shell +/test operational-tests-capz-windows-2019 +``` + +## Looking ahead + +We’re looking to improve our curated list of Windows-specific tests by adding +new tests to the Kubernetes repository and also identifying existing test cases +that can be targetted. The long term goal for the specification is to continually +enhance test coverage for Windows worker nodes and improve the robustness of +Windows support, facilitating a seamless experience across diverse cloud +environments. We also have plans to integrate the Windows Operational Readiness +tests into the official Kubernetes conformance suite. + +If you are interested in helping us out, please reach out to us! We welcome help +in any form, from giving once-off feedback to making a code contribution, +to having long-term owners to help us drive changes. The Windows Operational +Readiness specification is owned by the SIG Windows team. You can reach out +to the team on the [Kubernetes Slack workspace](https://slack.k8s.io/) **#sig-windows** +channel. You can also explore the [Windows Operational Readiness test suite](https://github.com/kubernetes-sigs/windows-operational-readiness/#readme) +and make contributions directly to the GitHub repository. + +Special thanks to Kulwant Singh (AWS), Pramita Gautam Rana (VMWare), Xinqi Li +(Google) for their help in making notable contributions to the specification. Additionally, +appreciation goes to James Sturtevant (Microsoft), Mark Rossetti (Microsoft), +Claudiu Belu (Cloudbase Solutions) and Aravindh Puthiyaparambil +(Softdrive Technologies Group Inc.) from the SIG Windows team for their guidance and support. \ No newline at end of file