Merge pull request #21498 from nirs/ai-playground-tutorial

docs: Add AI playground tutorial
2025-09-09 11:25:15 -07:00 · 2025-09-09 11:25:15 -07:00 · 0c7dcb8ddd
parent 6931505862 1fa69a33cc
commit 0c7dcb8ddd
3 changed files with 468 additions and 0 deletions
--- a/site/content/en/docs/_index.md
+++ b/site/content/en/docs/_index.md
@ -16,6 +16,7 @@ minikube quickly sets up a local Kubernetes cluster on macOS, Linux, and Windows
 ## Highlights

 * Supports the latest Kubernetes release (+6 previous minor versions)
+* Supports GPUs for AI developement ([nvidia]({{< ref "/docs/tutorials/nvidia.md" >}}), [amd]({{< ref "/docs/tutorials/amd.md" >}}), [apple]({{< ref "/docs/tutorials/ai-playground.md" >}}))
 * Cross-platform (Linux, macOS, Windows)
 * Deploy as a VM, a container, or on bare-metal
 * Multiple container runtimes (CRI-O, containerd, docker)
--- a/site/content/en/docs/tutorials/ai-playground.md
+++ b/site/content/en/docs/tutorials/ai-playground.md
@ -0,0 +1,467 @@
+---
+title: "Minikube AI playground on Apple silicon"
+linkTitle: "Minikube AI playground on Apple silicon"
+weight: 1
+date: 2024-10-04
+---
+
+This tutorial shows how to create an AI playground with minikube on Apple
+silicon devices such as a MacBook Pro. We'll create a cluster that shares your
+Mac's GPU using the krunkit driver, deploy two large language models, and
+interact with the models using Open WebUI.
+
+![Open WebUI Chat](/images/open-webui-chat.png)
+
+## Prerequisites
+
+- Apple silicon Mac
+- [krunkit](https://github.com/containers/krunkit) v1.0.0 or later
+- [vmnet-helper](https://github.com/nirs/vmnet-helper) v0.6.0 or later
+- [generic-device-plugin](https://github.com/squat/generic-device-plugin)
+- minikube v1.37.0 or later (krunkit driver only)
+
+## Installing krunkit and vmnet-helper
+
+Install latest krunkit:
+
+```shell
+brew tap slp/krunkit
+brew install krunkit
+krunkit --version
+```
+
+Instal latest vmnet-helper:
+
+```shell
+curl -fsSL https://github.com/minikube-machine/vmnet-helper/releases/latest/download/install.sh | bash
+/opt/vmnet-helper/bin/vmnet-helper --version
+```
+
+For more information, see the [krunkit driver](https://minikube.sigs.k8s.io/docs/drivers/krunkit/)
+documentation.
+
+## Download models
+
+Download some models to the local disk. By keeping the models outside of
+minikube, you can create and delete clusters quickly without downloading the
+models again.
+
+```shell
+mkdir ~/models
+cd ~/models
+curl -LO 'https://huggingface.co/instructlab/granite-7b-lab-GGUF/resolve/main/granite-7b-lab-Q4_K_M.gguf?download=true'
+curl -LO 'https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q8_0.gguf?download=true'
+```
+
+**Important**: The model must be in *GGUF* format.
+
+## Start minikube
+
+Start a minikube cluster with the krunkit driver, mounting the `~/models`
+directory at `/mnt/models`:
+
+```shell
+minikube start --driver krunkit --mount-string ~/models:/mnt/models
+```
+
+Output:
+```
+😄  minikube v1.37.0 on Darwin 15.6.1 (arm64)
+✨  Using the krunkit (experimental) driver based on user configuration
+👍  Starting "minikube" primary control-plane node in "minikube" cluster
+🔥  Creating krunkit VM (CPUs=2, Memory=6144MB, Disk=20000MB) ...
+🐳  Preparing Kubernetes v1.34.0 on Docker 28.4.0 ...
+🔗  Configuring bridge CNI (Container Networking Interface) ...
+🔎  Verifying Kubernetes components...
+    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
+🌟  Enabled addons: storage-provisioner, default-storageclass
+🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
+```
+
+### Verifying that the GPU is available
+
+The krunkit driver exposes your host GPU as a virtio-gpu device:
+
+```
+% minikube ssh -- tree /dev/dri
+/dev/dri
+|-- by-path
+|   |-- platform-a007000.virtio_mmio-card -> ../card0
+|   `-- platform-a007000.virtio_mmio-render -> ../renderD128
+|-- card0
+`-- renderD128
+```
+
+## Deploying the generic-device-plugin
+
+To use the GPU in pods, we need the generic-device-plugin. Deploy it with:
+
+```shell
+cat <<'EOF' | kubectl apply -f -
+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: generic-device-plugin
+  namespace: kube-system
+  labels:
+    app.kubernetes.io/name: generic-device-plugin
+spec:
+  selector:
+    matchLabels:
+      app.kubernetes.io/name: generic-device-plugin
+  template:
+    metadata:
+      labels:
+        app.kubernetes.io/name: generic-device-plugin
+    spec:
+      priorityClassName: system-node-critical
+      tolerations:
+      - operator: "Exists"
+        effect: "NoExecute"
+      - operator: "Exists"
+        effect: "NoSchedule"
+      containers:
+      - image: squat/generic-device-plugin
+        args:
+        - --device
+        - |
+          name: dri
+          groups:
+          - count: 4
+            paths:
+            - path: /dev/dri
+        name: generic-device-plugin
+        resources:
+          requests:
+            cpu: 50m
+            memory: 10Mi
+          limits:
+            cpu: 50m
+            memory: 20Mi
+        ports:
+        - containerPort: 8080
+          name: http
+        securityContext:
+          privileged: true
+        volumeMounts:
+        - name: device-plugin
+          mountPath: /var/lib/kubelet/device-plugins
+        - name: dev
+          mountPath: /dev
+      volumes:
+      - name: device-plugin
+        hostPath:
+          path: /var/lib/kubelet/device-plugins
+      - name: dev
+        hostPath:
+          path: /dev
+  updateStrategy:
+    type: RollingUpdate
+EOF
+```
+
+**Note**: This configuration allows up to 4 pods to use `/dev/dri`. You can
+increase `count` to run more pods using the GPU.
+
+Wait until the generic-device-plugin DaemonSet is available:
+
+```shell
+% kubectl get daemonset generic-device-plugin -n kube-system -w
+NAME                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
+generic-device-plugin   1         1         1       1            1           <none>          45s
+```
+
+## Deploying the granite model
+
+To play with the granite model you downloaded, start a llama-server pod serving
+the model and a service to make the pod available to other pods.
+
+```shell
+cat <<'EOF' | kubectl apply -f -
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: granite
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: granite
+  template:
+    metadata:
+      labels:
+        app: granite
+      name: granite
+    spec:
+      containers:
+      - name: llama-server
+        image: quay.io/ramalama/ramalama:latest
+        command: [
+          llama-server,
+          --host, "0.0.0.0",
+          --port, "8080",
+          --model, /mnt/models/granite-7b-lab-Q4_K_M.gguf,
+          --alias, "ibm/granite:7b",
+          --ctx-size, "2048",
+          --temp, "0.8",
+          --cache-reuse, "256",
+          -ngl, "999",
+          --threads, "6",
+          --no-warmup,
+          --log-colors,
+        ]
+        resources:
+          limits:
+            squat.ai/dri: 1
+        volumeMounts:
+        - name: models
+          mountPath: /mnt/models
+      volumes:
+      - name: models
+        hostPath:
+          path: /mnt/models
+---
+apiVersion: v1
+kind: Service
+metadata:
+  labels:
+    app: granite
+  name: granite
+spec:
+  ports:
+  - protocol: TCP
+    port: 8080
+  selector:
+    app: granite
+EOF
+```
+
+Wait until the deployment is available:
+
+```shell
+% kubectl get deploy granite
+NAME      READY   UP-TO-DATE   AVAILABLE   AGE
+granite   1/1     1            1           8m17s
+```
+
+Check the granite service:
+
+```shell
+% kubectl get service granite
+NAME      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
+granite   ClusterIP   10.105.145.9   <none>        8080/TCP   28m
+```
+
+## Deploying the tinyllama model
+
+To play with the tinyllama model you downloaded, start a llama-server pod
+serving the model and a service to make the pod available to other pods.
+
+```shell
+cat <<'EOF' | kubectl apply -f -
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: tinyllama
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: tinyllama
+  template:
+    metadata:
+      labels:
+        app: tinyllama
+      name: tinyllama
+    spec:
+      containers:
+      - name: llama-server
+        image: quay.io/ramalama/ramalama:latest
+        command: [
+          llama-server,
+          --host, "0.0.0.0",
+          --port, "8080",
+          --model, /mnt/models/tinyllama-1.1b-chat-v1.0.Q8_0.gguf,
+          --alias, tinyllama,
+          --ctx-size, "2048",
+          --temp, "0.8",
+          --cache-reuse, "256",
+          -ngl, "999",
+          --threads, "6",
+          --no-warmup,
+          --log-colors,
+        ]
+        resources:
+          limits:
+            squat.ai/dri: 3
+        volumeMounts:
+        - name: models
+          mountPath: /mnt/models
+      volumes:
+      - name: models
+        hostPath:
+          path: /mnt/models
+---
+apiVersion: v1
+kind: Service
+metadata:
+  labels:
+    app: tinyllama
+  name: tinyllama
+spec:
+  ports:
+  - protocol: TCP
+    port: 8080
+  selector:
+    app: tinyllama
+EOF
+```
+
+Wait until the deployment is available:
+
+```
+% kubectl get deploy tinyllama
+NAME        READY   UP-TO-DATE   AVAILABLE   AGE
+tinyllama   1/1     1            1           9m14s
+```
+
+Check the tinyllama service:
+
+```shell
+% kubectl get service tinyllama
+NAME        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
+tinyllama   ClusterIP   10.98.219.117   <none>        8080/TCP   23m
+```
+
+## Deploying Open WebUI
+
+The [Open WebUI](https://docs.openwebui.com) project provides an easy-to-use web
+interface for interacting with OpenAI-compatible APIs such as our llama-server
+pods.
+
+To deploy Open WebUI, run:
+
+```shell
+---
+cat <<'EOF' | kubectl apply -f -
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: open-webui
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: open-webui
+  template:
+    metadata:
+      labels:
+        app: open-webui
+    spec:
+      containers:
+        - name: open-webui
+          image: ghcr.io/open-webui/open-webui:dev-slim
+          ports:
+            - containerPort: 8080
+          env:
+            # Preconfigure OpenAI-compatible endpoints
+            - name: OPENAI_API_BASE_URLS
+              value: "http://granite:8080/v1;http://tinyllama:8080/v1"
+          volumeMounts:
+            - name: open-webui-data
+              mountPath: /app/backend/data
+      volumes:
+        - name: open-webui-data
+          persistentVolumeClaim:
+            claimName: open-webui-data
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: open-webui-data
+spec:
+  storageClassName: standard
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 1Gi
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: open-webui
+spec:
+  ports:
+  - protocol: TCP
+    port: 8080
+    nodePort: 30080
+  selector:
+    app: open-webui
+  type: NodePort
+EOF
+```
+
+We configured our llama-server OpenAI compatible API endpoints using the
+`OPENAI_API_BASE_URLS` environment variable.
+Check [Open WebUI documenation](https://docs.openwebui.com) to learn how to
+configure it using the Admin Panel.
+
+Wait until the deployment is available:
+
+```shell
+% kubectl get deploy open-webui
+NAME         READY   UP-TO-DATE   AVAILABLE   AGE
+open-webui   1/1     1            1           69s
+```
+
+## Interacting with the models
+
+Open a browser with Open WebUI console:
+
+```shell
+open $(minikube service open-webui --url)
+```
+
+Create an Admin account to start using Open WebUI.
+
+### Chatting with the granite model
+
+You can start chatting with the "ibm/granite:7b" model.
+
+Type a prompt:
+
+```
+> Write a very technical haiku about playing with large language models with Minikube on Apple silicon
+
+Mighty model, Minikube,
+Silicon-powered speed,
+Learning's dance, ever-changing.
+Through data streams it weaves,
+Inference's wisdom, vast and deep,
+Apple's heartbeat, in code, resounds.
+Exploring AI's vast frontier,
+Minikube, language model's playground,
+Innovation's rhythm, forever.
+```
+
+### Chatting with the tinyllama model
+
+Click the "New Chat" button on the left and select the "tinyllama" model from
+the model menu in the top left.
+
+Type a prompt:
+
+```
+> How do you feel inside this fancy Minikube cluster?
+
+I do not have a physical body. However, based on the given text material, the
+author is describing feeling inside a cluster of Minikube, a type of jellyfish.
+The use of the word "fancy" suggests that the author is impressed or appreciates
+the intricate design of the cluster, while the adjective "minikube" connotes its
+smooth texture, delicate shape, and iridescent colors. The word "cluster"
+suggests a group of these jellyfish, while "inside" implies being in the
+vicinity or enclosed within.
+```
--- a/site/static/images/open-webui-chat.png
+++ b/site/static/images/open-webui-chat.png