Merge pull request #45432 from kvaps/cozystack3
blog: DIY: Create Your Own Cloud with Kubernetes (Part 2) blog postpull/45769/head^2
|
@ -0,0 +1,260 @@
|
||||||
|
---
|
||||||
|
layout: blog
|
||||||
|
title: "DIY: Create Your Own Cloud with Kubernetes (Part 2)"
|
||||||
|
slug: diy-create-your-own-cloud-with-kubernetes-part-2
|
||||||
|
date: 2024-04-05T07:35:00+00:00
|
||||||
|
---
|
||||||
|
|
||||||
|
**Author**: Andrei Kvapil (Ænix)
|
||||||
|
|
||||||
|
Continuing our series of posts on how to build your own cloud using just the Kubernetes ecosystem.
|
||||||
|
In the [previous article](/blog/2024/04/05/diy-create-your-own-cloud-with-kubernetes-part-1/), we
|
||||||
|
explained how we prepare a basic Kubernetes distribution based on Talos Linux and Flux CD.
|
||||||
|
In this article, we'll show you a few various virtualization technologies in Kubernetes and prepare
|
||||||
|
everything need to run virtual machines in Kubernetes, primarily storage and networking.
|
||||||
|
|
||||||
|
We will talk about technologies such as KubeVirt, LINSTOR, and Kube-OVN.
|
||||||
|
|
||||||
|
But first, let's explain what virtual machines are needed for, and why can't you just use docker
|
||||||
|
containers for building cloud?
|
||||||
|
The reason is that containers do not provide a sufficient level of isolation.
|
||||||
|
Although the situation improves year by year, we often encounter vulnerabilities that allow
|
||||||
|
escaping the container sandbox and elevating privileges in the system.
|
||||||
|
|
||||||
|
On the other hand, Kubernetes was not originally designed to be a multi-tenant system, meaning
|
||||||
|
the basic usage pattern involves creating a separate Kubernetes cluster for every independent
|
||||||
|
project and development team.
|
||||||
|
|
||||||
|
Virtual machines are the primary means of isolating tenants from each other in a cloud environment.
|
||||||
|
In virtual machines, users can execute code and programs with administrative privilege, but this
|
||||||
|
doesn't affect other tenants or the environment itself. In other words, virtual machines allow to
|
||||||
|
achieve [hard multi-tenancy isolation](/docs/concepts/security/multi-tenancy/#isolation), and run
|
||||||
|
in environments where tenants do not trust each other.
|
||||||
|
|
||||||
|
## Virtualization technologies in Kubernetes
|
||||||
|
|
||||||
|
There are several different technologies that bring virtualization into the Kubernetes world:
|
||||||
|
[KubeVirt](https://kubevirt.io/) and [Kata Containers](https://katacontainers.io/)
|
||||||
|
are the most popular ones. But you should know that they work differently.
|
||||||
|
|
||||||
|
**Kata Containers** implements the CRI (Container Runtime Interface) and provides an additional
|
||||||
|
level of isolation for standard containers by running them in virtual machines.
|
||||||
|
But they work in a same single Kubernetes-cluster.
|
||||||
|
|
||||||
|
{{< figure src="kata-containers.svg" caption="A diagram showing how container isolation is ensured by running containers in virtual machines with Kata Containers" alt="A diagram showing how container isolation is ensured by running containers in virtual machines with Kata Containers" >}}
|
||||||
|
|
||||||
|
**KubeVirt** allows running traditional virtual machines using the Kubernetes API. KubeVirt virtual
|
||||||
|
machines are run as regular linux processes in containers. In other words, in KubeVirt, a container
|
||||||
|
is used as a sandbox for running virtual machine (QEMU) processes.
|
||||||
|
This can be clearly seen in the figure below, by looking at how live migration of virtual machines
|
||||||
|
is implemented in KubeVirt. When migration is needed, the virtual machine moves from one container
|
||||||
|
to another.
|
||||||
|
|
||||||
|
{{< figure src="kubevirt-migration.svg" caption="A diagram showing live migration of a virtual machine from one container to another in KubeVirt" alt="A diagram showing live migration of a virtual machine from one container to another in KubeVirt" >}}
|
||||||
|
|
||||||
|
There is also an alternative project - [Virtink](https://github.com/smartxworks/virtink), which
|
||||||
|
implements lightweight virtualization using
|
||||||
|
[Cloud-Hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor) and is initially focused
|
||||||
|
on running virtual Kubernetes clusters using the Cluster API.
|
||||||
|
|
||||||
|
Considering our goals, we decided to use KubeVirt as the most popular project in this area.
|
||||||
|
Besides we have extensive expertise and already made a lot of contributions to KubeVirt.
|
||||||
|
|
||||||
|
KubeVirt is [easy to install](https://kubevirt.io/user-guide/operations/installation/) and allows
|
||||||
|
you to run virtual machines out-of-the-box using
|
||||||
|
[containerDisk](https://kubevirt.io/user-guide/virtual_machines/disks_and_volumes/#containerdisk)
|
||||||
|
feature - this allows you to store and distribute VM images directly as OCI images from container
|
||||||
|
image registry.
|
||||||
|
Virtual machines with containerDisk are well suited for creating Kubernetes worker nodes and other
|
||||||
|
VMs that do not require state persistence.
|
||||||
|
|
||||||
|
For managing persistent data, KubeVirt offers a separate tool, Containerized Data Importer (CDI).
|
||||||
|
It allows for cloning PVCs and populating them with data from base images. The CDI is necessary
|
||||||
|
if you want to automatically provision persistent volumes for your virtual machines, and it is
|
||||||
|
also required for the KubeVirt CSI Driver, which is used to handle persistent volumes claims
|
||||||
|
from tenant Kubernetes clusters.
|
||||||
|
|
||||||
|
But at first, you have to decide where and how you will store these data.
|
||||||
|
|
||||||
|
## Storage for Kubernetes VMs
|
||||||
|
|
||||||
|
With the introduction of the CSI (Container Storage Interface), a wide range of technologies that
|
||||||
|
integrate with Kubernetes has become available.
|
||||||
|
In fact, KubeVirt fully utilizes the CSI interface, aligning the choice of storage for
|
||||||
|
virtualization closely with the choice of storage for Kubernetes itself.
|
||||||
|
However, there are nuances, which you need to consider. Unlike containers, which typically use a
|
||||||
|
standard filesystem, block devices are more efficient for virtual machine.
|
||||||
|
|
||||||
|
Although the CSI interface in Kubernetes allows the request of both types of volumes: filesystems
|
||||||
|
and block devices, it's important to verify that your storage backend supports this.
|
||||||
|
|
||||||
|
Using block devices for virtual machines eliminates the need for an additional abstraction layer,
|
||||||
|
such as a filesystem, that makes it more performant and in most cases enables the use of the
|
||||||
|
_ReadWriteMany_ mode. This mode allows concurrent access to the volume from multiple nodes, which
|
||||||
|
is a critical feature for enabling the live migration of virtual machines in KubeVirt.
|
||||||
|
|
||||||
|
The storage system can be external or internal (in the case of hyper-converged infrastructure).
|
||||||
|
Using external storage in many cases makes the whole system more stable, as your data is stored
|
||||||
|
separately from compute nodes.
|
||||||
|
|
||||||
|
{{< figure src="storage-external.svg" caption="A diagram showing external data storage communication with the compute nodes" alt="A diagram showing external data storage communication with the compute nodes" >}}
|
||||||
|
|
||||||
|
External storage solutions are often popular in enterprise systems because such storage is
|
||||||
|
frequently provided by an external vendor, that takes care of its operations. The integration with
|
||||||
|
Kubernetes involves only a small component installed in the cluster - the CSI driver. This driver
|
||||||
|
is responsible for provisioning volumes in this storage and attaching them to pods run by Kubernetes.
|
||||||
|
However, such storage solutions can also be implemented using purely open-source technologies.
|
||||||
|
One of the popular solutions is [TrueNAS](https://www.truenas.com/) powered by
|
||||||
|
[democratic-csi](https://github.com/democratic-csi/democratic-csi) driver.
|
||||||
|
|
||||||
|
{{< figure src="storage-local.svg" caption="A diagram showing local data storage running on the compute nodes" alt="A diagram showing local data storage running on the compute nodes" >}}
|
||||||
|
|
||||||
|
On the other hand, hyper-converged systems are often implemented using local storage (when you do
|
||||||
|
not need replication) and with software-defined storages, often installed directly in Kubernetes,
|
||||||
|
such as [Rook/Ceph](https://rook.io/), [OpenEBS](https://openebs.io/),
|
||||||
|
[Longhorn](https://longhorn.io/), [LINSTOR](https://linbit.com/linstor/), and others.
|
||||||
|
|
||||||
|
{{< figure src="storage-clustered.svg" caption="A diagram showing clustered data storage running on the compute nodes" alt="A diagram showing clustered data storage running on the compute nodes" >}}
|
||||||
|
|
||||||
|
A hyper-converged system has its advantages. For example, data locality: when your data is stored
|
||||||
|
locally, access to such data is faster. But there are disadvantages as such a system is usually
|
||||||
|
more difficult to manage and maintain.
|
||||||
|
|
||||||
|
At Ænix, we wanted to provide a ready-to-use solution that could be used without the need to
|
||||||
|
purchase and setup an additional external storage, and that was optimal in terms of speed and
|
||||||
|
resource utilization. LINSTOR became that solution.
|
||||||
|
The time-tested and industry-popular technologies such as LVM and ZFS as backend gives confidence
|
||||||
|
that data is securely stored. DRBD-based replication is incredible fast and consumes a small amount
|
||||||
|
of computing resources.
|
||||||
|
|
||||||
|
For installing LINSTOR in Kubernetes, there is the Piraeus project, which already provides a
|
||||||
|
ready-made block storage to use with KubeVirt.
|
||||||
|
|
||||||
|
{{< note >}}
|
||||||
|
In case you are using Talos Linux, as we described in the
|
||||||
|
[previous article](/blog/2024/04/05/diy-create-your-own-cloud-with-kubernetes-part-1/), you will
|
||||||
|
need to enable the necessary kernel modules in advance, and configure piraeus as described in the
|
||||||
|
[instruction](https://github.com/piraeusdatastore/piraeus-operator/blob/v2/docs/how-to/talos.md).
|
||||||
|
{{< /note >}}
|
||||||
|
|
||||||
|
## Networking for Kubernetes VMs
|
||||||
|
|
||||||
|
Despite having the similar interface - CNI, The network architecture in Kubernetes is actually more
|
||||||
|
complex and typically consists of many independent components that are not directly connected to
|
||||||
|
each other. In fact, you can split Kubernetes networking into four layers, which are described below.
|
||||||
|
|
||||||
|
### Node Network (Data Center Network)
|
||||||
|
|
||||||
|
The network through which nodes are interconnected with each other. This network is usually not
|
||||||
|
managed by Kubernetes, but it is an important one because, without it, nothing would work.
|
||||||
|
In practice, the bare metal infrastructure usually has more than one of such networks e.g.
|
||||||
|
one for node-to-node communication, second for storage replication, third for external access, etc.
|
||||||
|
|
||||||
|
{{< figure src="net-nodes.svg" caption="A diagram showing the role of the node network (data center network) on the Kubernetes networking scheme" alt="A diagram showing the role of the node network (data center network) on the Kubernetes networking scheme" >}}
|
||||||
|
|
||||||
|
Configuring the physical network interaction between nodes goes beyond the scope of this article,
|
||||||
|
as in most situations, Kubernetes utilizes already existing network infrastructure.
|
||||||
|
|
||||||
|
### Pod Network
|
||||||
|
|
||||||
|
This is the network provided by your CNI plugin. The task of the CNI plugin is to ensure transparent
|
||||||
|
connectivity between all containers and nodes in the cluster. Most CNI plugins implement a flat
|
||||||
|
network from which separate blocks of IP addresses are allocated for use on each node.
|
||||||
|
|
||||||
|
{{< figure src="net-pods.svg" caption="A diagram showing the role of the pod network (CNI-plugin) on the Kubernetes network scheme" alt="A diagram showing the role of the pod network (CNI-plugin) on the Kubernetes network scheme" >}}
|
||||||
|
|
||||||
|
In practice, your cluster can have several CNI plugins managed by
|
||||||
|
[Multus](https://github.com/k8snetworkplumbingwg/multus-cni). This approach is often used in
|
||||||
|
virtualization solutions based on KubeVirt - [Rancher](https://www.rancher.com/) and
|
||||||
|
[OpenShift](https://www.redhat.com/en/technologies/cloud-computing/openshift/virtualization).
|
||||||
|
The primary CNI plugin is used for integration with Kubernetes services, while additional CNI
|
||||||
|
plugins are used to implement private networks (VPC) and integration with the physical networks
|
||||||
|
of your data center.
|
||||||
|
|
||||||
|
The [default CNI-plugins](https://github.com/containernetworking/plugins/tree/main/plugins) can
|
||||||
|
be used to connect bridges or physical interfaces. Additionally, there are specialized plugins
|
||||||
|
such as [macvtap-cni](https://github.com/kubevirt/macvtap-cni) which are designed to provide
|
||||||
|
more performance.
|
||||||
|
|
||||||
|
One additional aspect to keep in mind when running virtual machines in Kubernetes is the need for
|
||||||
|
IPAM (IP Address Management), especially for secondary interfaces provided by Multus. This is
|
||||||
|
commonly managed by a DHCP server operating within your infrastructure. Additionally, the allocation
|
||||||
|
of MAC addresses for virtual machines can be managed by
|
||||||
|
[Kubemacpool](https://github.com/k8snetworkplumbingwg/kubemacpool).
|
||||||
|
|
||||||
|
Although in our platform, we decided to go another way and fully rely on
|
||||||
|
[Kube-OVN](https://www.kube-ovn.io/). This CNI plugin is based on OVN (Open Virtual Network) which
|
||||||
|
was originally developed for OpenStack and it provides a complete network solution for virtual
|
||||||
|
machines in Kubernetes, features Custom Resources for managing IPs and MAC addresses, supports
|
||||||
|
live migration with preserving IP addresses between the nodes, and enables the creation of VPCs
|
||||||
|
for physical network separation between tenants.
|
||||||
|
|
||||||
|
In Kube-OVN you can assign separate subnets to an entire namespace or connect them as additional
|
||||||
|
network interfaces using Multus.
|
||||||
|
|
||||||
|
### Services Network
|
||||||
|
|
||||||
|
In addition to the CNI plugin, Kubernetes also has a services network, which is primarily needed
|
||||||
|
for service discovery.
|
||||||
|
Contrary to traditional virtual machines, Kubernetes is originally designed to run pods with a
|
||||||
|
random address.
|
||||||
|
And the services network provides a convenient abstraction (stable IP addresses and DNS names)
|
||||||
|
that will always direct traffic to the correct pod.
|
||||||
|
The same approach is also commonly used with virtual machines in clouds despite the fact that
|
||||||
|
their IPs are usually static.
|
||||||
|
|
||||||
|
{{< figure src="net-services.svg" caption="A diagram showing the role of the services network (services network plugin) on the Kubernetes network scheme" alt="A diagram showing the role of the services network (services network plugin) on the Kubernetes network scheme" >}}
|
||||||
|
|
||||||
|
|
||||||
|
The implementation of the services network in Kubernetes is handled by the services network plugin,
|
||||||
|
The standard implementation is called **kube-proxy** and is used in most clusters.
|
||||||
|
But nowadays, this functionality might be provided as part of the CNI plugin. The most advanced
|
||||||
|
implementation is offered by the [Cilium](https://cilium.io/) project, which can be run in kube-proxy replacement mode.
|
||||||
|
|
||||||
|
Cilium is based on the eBPF technology, which allows for efficient offloading of the Linux
|
||||||
|
networking stack, thereby improving performance and security compared to traditional methods based
|
||||||
|
on iptables.
|
||||||
|
|
||||||
|
In practice, Cilium and Kube-OVN can be easily
|
||||||
|
[integrated](https://kube-ovn.readthedocs.io/zh-cn/stable/en/advance/with-cilium/) to provide a
|
||||||
|
unified solution that offers seamless, multi-tenant networking for virtual machines, as well as
|
||||||
|
advanced network policies and combined services network functionality.
|
||||||
|
|
||||||
|
### External Traffic Load Balancer
|
||||||
|
|
||||||
|
At this stage, you already have everything needed to run virtual machines in Kubernetes.
|
||||||
|
But there is actually one more thing.
|
||||||
|
You still need to access your services from outside your cluster, and an external load balancer
|
||||||
|
will help you with organizing this.
|
||||||
|
|
||||||
|
For bare metal Kubernetes clusters, there are several load balancers available:
|
||||||
|
[MetalLB](https://metallb.universe.tf/), [kube-vip](https://kube-vip.io/),
|
||||||
|
[LoxiLB](https://www.loxilb.io/), also [Cilium](https://docs.cilium.io/en/latest/network/lb-ipam/) and
|
||||||
|
[Kube-OVN](https://kube-ovn.readthedocs.io/zh-cn/latest/en/guide/loadbalancer-service/)
|
||||||
|
provides built-in implementation.
|
||||||
|
|
||||||
|
The role of a external load balancer is to provide a stable address available externally and direct
|
||||||
|
external traffic to the services network.
|
||||||
|
The services network plugin will direct it to your pods and virtual machines as usual.
|
||||||
|
|
||||||
|
{{< figure src="net-services.svg" caption="A diagram showing the role of the external load balancer on the Kubernetes network scheme" alt="The role of the external load balancer on the Kubernetes network scheme" >}}
|
||||||
|
|
||||||
|
In most cases, setting up a load balancer on bare metal is achieved by creating floating IP address
|
||||||
|
on the nodes within the cluster, and announce it externally using ARP/NDP or BGP protocols.
|
||||||
|
|
||||||
|
After exploring various options, we decided that MetalLB is the simplest and most reliable solution,
|
||||||
|
although we do not strictly enforce the use of only it.
|
||||||
|
|
||||||
|
Another benefit is that in L2 mode, MetalLB speakers continuously check their neighbour's state by
|
||||||
|
sending preforming liveness checks using a memberlist protocol.
|
||||||
|
This enables failover that works independently of Kubernetes control-plane.
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
This concludes our overview of virtualization, storage, and networking in Kubernetes.
|
||||||
|
The technologies mentioned here are available and already pre-configured on the
|
||||||
|
[Cozystack](https://github.com/aenix-io/cozystack) platform, where you can try them with no limitations.
|
||||||
|
|
||||||
|
In the [next article](/blog/2024/04/05/diy-create-your-own-cloud-with-kubernetes-part-3/),
|
||||||
|
I'll detail how, on top of this, you can implement the provisioning of fully functional Kubernetes
|
||||||
|
clusters with just the click of a button.
|
After Width: | Height: | Size: 28 KiB |
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 117 KiB |
After Width: | Height: | Size: 73 KiB |
After Width: | Height: | Size: 88 KiB |
After Width: | Height: | Size: 101 KiB |
After Width: | Height: | Size: 28 KiB |
After Width: | Height: | Size: 28 KiB |
After Width: | Height: | Size: 41 KiB |