How to Set Up a K3S Cluster in 2025

Feb 11, 2025

Rebuilding my K3s cluster from scratch with Ansible — VM provisioning via Cloud-Init on Proxmox, HA across three nodes, and full automation.

My first Kubernetes clusters are gone. This time I want a proper HA setup — if any machine goes down, the cluster keeps running. The target layout across my hardware:

1× DELL R720 → k3s-master-1 and k3s-worker-1
1× DELL Optiplex Micro 3050 → k3s-master-2 and k3s-worker-2
1× DELL Optiplex Micro 3050 → k3s-master-3 and k3s-worker-3

Six VMs total on a Proxmox cluster: 3 Ubuntu 22.04 master nodes, 3 Ubuntu 22.04 worker nodes.

DNS and IP Addressing

Before creating any VMs, get your IP and DNS situation sorted.

For IP assignment, you have two options: assign addresses outside your DHCP range (what I do — network stays stable even if DHCP goes down), or use static MAC→IP mappings in your DHCP server.

I’m using 10.57.57.30/24 through 10.57.57.35/24 for the six VMs, with an A record in Unbound on pfSense for each:

Unbound pfSense DNS Configuration

VM Deployment with Cloud-Init

Rather than clicking through the Proxmox UI six times, I wrote a bash script that handles template creation, VM deployment, and teardown. If you’d prefer a Packer/Terraform approach, see Homelab as Code.

Warning

This script can create or destroy VMs. Keep backups of anything critical before running option 3.

Prerequisites: Proxmox up and running, SSH public key at /root/.ssh/id_rsa.pub on the Proxmox host.

The script has three modes:

Option 1 — Create Cloud-Init Template: Downloads the Ubuntu 24.04 cloud image, creates a VM, configures cloud-init, and converts it to a template.

Option 2 — Deploy VMs: Clones the template N times, sets IPs, gateway, DNS, search domain, SSH keys, CPU, RAM, and disk size per VM. Prompts for a name on each one.

Option 3 — Destroy VMs: Stops and removes VMs by ID range.

#!/bin/bash

# Function to get user input with a default value
get_input() {
    local prompt=$1
    local default=$2
    local input
    read -p "$prompt [$default]: " input
    echo "${input:-$default}"
}

# Ask the user whether they want to create a template, deploy or destroy VMs
echo "Select an option:"
echo "1) Create Cloud-Init Template"
echo "2) Deploy VMs"
echo "3) Destroy VMs"
read -p "Enter your choice (1, 2, or 3): " ACTION

if [[ "$ACTION" != "1" && "$ACTION" != "2" && "$ACTION" != "3" ]]; then
    echo "❌ Invalid choice. Please run the script again and select 1, 2, or 3."
    exit 1
fi

# === OPTION 1: CREATE CLOUD-INIT TEMPLATE ===
if [[ "$ACTION" == "1" ]]; then
    TEMPLATE_ID=$(get_input "Enter the template VM ID" "300")
    STORAGE=$(get_input "Enter the storage name" "local")
    TEMPLATE_NAME=$(get_input "Enter the template name" "ubuntu-cloud")
    IMG_URL="https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img"
    IMG_FILE="/root/noble-server-cloudimg-amd64.img"

    echo "📥 Downloading Ubuntu Cloud image for cloud-init setup..."
    cd /root
    wget -O $IMG_FILE $IMG_URL || { echo "❌ Failed to download the image"; exit 1; }

    echo "🖥️ Creating VM $TEMPLATE_ID..."
    qm create $TEMPLATE_ID --memory 2048 --cores 2 --name $TEMPLATE_NAME --net0 virtio,bridge=vmbr0

    echo "💾 Importing disk to storage ($STORAGE)..."
    qm disk import $TEMPLATE_ID $IMG_FILE $STORAGE || { echo "❌ Failed to import disk"; exit 1; }

    echo "🔗 Attaching disk..."
    qm set $TEMPLATE_ID --scsihw virtio-scsi-pci --scsi0 $STORAGE:vm-$TEMPLATE_ID-disk-0

    echo "☁️ Adding Cloud-Init drive..."
    qm set $TEMPLATE_ID --ide2 $STORAGE:cloudinit

    echo "🛠️ Configuring boot settings..."
    qm set $TEMPLATE_ID --boot c --bootdisk scsi0

    echo "🖧 Adding serial console..."
    qm set $TEMPLATE_ID --serial0 socket --vga serial0

    echo "📌 Converting VM to template..."
    qm template $TEMPLATE_ID

    echo "✅ Cloud-Init Template created successfully!"
    exit 0
fi

# === OPTION 2: DEPLOY VMs ===
if [[ "$ACTION" == "2" ]]; then
    TEMPLATE_ID=$(get_input "Enter the template VM ID" "300")
    START_ID=$(get_input "Enter the starting VM ID" "301")
    NUM_VMS=$(get_input "Enter the number of VMs to deploy" "6")
    STORAGE=$(get_input "Enter the storage name" "dataz2")
    IP_PREFIX=$(get_input "Enter the IP prefix (e.g., 10.57.57.)" "10.57.57.")
    IP_START=$(get_input "Enter the starting IP last octet" "30")
    GATEWAY=$(get_input "Enter the gateway IP" "10.57.57.1")
    DNS_SERVERS=$(get_input "Enter the DNS servers (space-separated)" "8.8.8.8 1.1.1.1")
    DOMAIN_SEARCH=$(get_input "Enter the search domain" "merox.dev")
    DISK_SIZE=$(get_input "Enter the disk size (e.g., 100G)" "100G")
    RAM_SIZE=$(get_input "Enter the RAM size in MB" "16384")
    CPU_CORES=$(get_input "Enter the number of CPU cores" "4")
    CPU_SOCKETS=$(get_input "Enter the number of CPU sockets" "4")
    SSH_KEY_PATH=$(get_input "Enter the SSH public key file path" "/root/.ssh/id_rsa.pub")

    if [[ ! -f "$SSH_KEY_PATH" ]]; then
        echo "❌ Error: SSH key file not found at $SSH_KEY_PATH"
        exit 1
    fi

    for i in $(seq 0 $((NUM_VMS - 1))); do
        VM_ID=$((START_ID + i))
        IP="$IP_PREFIX$((IP_START + i))/24"
        VM_NAME=$(get_input "Enter the name for VM $VM_ID" "ubuntu-vm-$((i+1))")

        echo "🔹 Creating VM: $VM_ID (Name: $VM_NAME, IP: $IP)"

        if qm status $VM_ID &>/dev/null; then
            echo "⚠️ VM $VM_ID already exists, removing..."
            qm stop $VM_ID &>/dev/null
            qm destroy $VM_ID
        fi

        if ! qm clone $TEMPLATE_ID $VM_ID --full --name $VM_NAME --storage $STORAGE; then
            echo "❌ Failed to clone VM $VM_ID, skipping..."
            continue
        fi

        qm set $VM_ID --memory $RAM_SIZE \
                      --cores $CPU_CORES \
                      --sockets $CPU_SOCKETS \
                      --cpu host \
                      --serial0 socket \
                      --vga serial0 \
                      --ipconfig0 ip=$IP,gw=$GATEWAY \
                      --nameserver "$DNS_SERVERS" \
                      --searchdomain "$DOMAIN_SEARCH" \
                      --sshkey "$SSH_KEY_PATH"

        qm set $VM_ID --delete ide2 || true
        qm set $VM_ID --ide2 $STORAGE:cloudinit,media=cdrom
        qm cloudinit update $VM_ID

        echo "🔄 Resizing disk to $DISK_SIZE..."
        qm resize $VM_ID scsi0 +$DISK_SIZE

        qm start $VM_ID
        echo "✅ VM $VM_ID ($VM_NAME) created and started!"
    done
    exit 0
fi

# === OPTION 3: DESTROY VMs ===
if [[ "$ACTION" == "3" ]]; then
    START_ID=$(get_input "Enter the starting VM ID to delete" "301")
    NUM_VMS=$(get_input "Enter the number of VMs to delete" "6")

    echo "⚠️ Destroying VMs from $START_ID to $((START_ID + NUM_VMS - 1))..."
    for i in $(seq 0 $((NUM_VMS - 1))); do
        VM_ID=$((START_ID + i))

        if qm status $VM_ID &>/dev/null; then
            echo "🛑 Stopping and destroying VM $VM_ID..."
            qm stop $VM_ID &>/dev/null
            qm destroy $VM_ID
        else
            echo "ℹ️ VM $VM_ID does not exist. Skipping..."
        fi
    done
    echo "✅ Specified VMs have been destroyed."
    exit 0
fi

After running option 2, verify the VMs appear in Proxmox and SSH in:

ssh ubuntu@k3s-master-01

Guide Structure

Follow these in order:

K3s Installation with Ansible — automated cluster deployment
Traefik Setup and SSL — ingress controller with Let’s Encrypt
Cluster Management — Rancher for management, Longhorn for storage
Advanced Resources — monitoring, NFS, ArgoCD, upgrades

Installing K3s with Ansible

merox

Feb 11, 2025

#kubernetes #k3s #ansible #automation

Automated K3s cluster deployment using Ansible playbooks with HA configuration across multiple nodes

This covers deploying K3s across your VMs using Ansible. We’re using a fork of TechnoTim’s k3s-ansible repo.

Prerequisites

Install Ansible on your management machine:

Debian/Ubuntu:

sudo apt update && sudo apt install -y ansible

macOS:

brew install ansible

Clone the repo:

git clone https://github.com/meroxdotdev/k3s-ansible

Configuration

cd k3s-ansible
cp ansible.example.cfg ansible.cfg
ansible-galaxy install -r ./collections/requirements.yml
cp -R inventory/sample inventory/my-cluster

inventory/my-cluster/hosts.ini — set your node IPs:

1
[master]
2
10.57.57.30
3
10.57.57.31
4
10.57.57.32
5

6
[node]
7
10.57.57.33
8
10.57.57.34
9
10.57.57.35
10

11
[k3s_cluster:children]
12
master
13
node

inventory/my-cluster/group_vars/all.yml — key fields to edit:

ansible_user: default VM user is ubuntu
system_timezone: set to your timezone, e.g. Europe/Bucharest
Networking: comment out #flannel_iface: eth0 and use calico_iface: "eth0" for better network policies. Flannel works too if you want something simpler.
apiserver_endpoint: 10.57.57.100 — an unused IP on your LAN, acts as the VIP for the K3s control plane
k3s_token: any alphanumeric string
metal_lb_ip_range: 10.57.57.80-10.57.57.90 — a range on your LAN, outside your DHCP pool, not used by anything else. This is how K3s services get exposed to your network.

Note

Make sure SSH key authentication is working between your management machine and all VMs before running the playbook.

Deploy

ansible-playbook ./site.yml -i ./inventory/my-cluster/hosts.ini

Once done, pull the kubeconfig and verify:

scp ubuntu@10.57.57.30:~/.kube/config .
mkdir -p ~/.kube
mv config ~/.kube/
kubectl get nodes

Next Steps

Proceed to the Traefik Setup guide.

Traefik Setup and SSL Configuration

merox

Feb 11, 2025

#kubernetes #traefik #ssl #cert-manager

Deploy Traefik ingress controller with Let's Encrypt SSL certificates using Cloudflare DNS challenge

This covers deploying Traefik as the ingress controller and wiring up cert-manager with Let’s Encrypt via Cloudflare.

Deploying Traefik

Install Helm:

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh

kubectl create namespace traefik
helm repo add traefik https://helm.traefik.io/traefik
helm repo update
git clone https://github.com/techno-tim/launchpad

In launchpad/kubernetes/traefik-cert-manager/, open values.yaml and set the LoadBalancer IP to something from your MetalLB range, then install:

helm install --namespace=traefik traefik traefik/traefik --values=values.yaml

Verify:

kubectl get svc --all-namespaces -o wide

Expected output:

NAMESPACE          NAME                              TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                    AGE     SELECTOR
calico-system      calico-typha                      ClusterIP      10.43.80.131    <none>        5473/TCP                                   2d20h   k8s-app=calico-typha
traefik            traefik                           LoadBalancer   10.43.185.67    10.57.57.80   80:32195/TCP,443:31598/TCP,443:31598/UDP   53s     app.kubernetes.io/instance=traefik,app.kubernetes.io/name=traefik

Apply middleware:

kubectl apply -f default-headers.yaml
kubectl get middleware

Expected output:

NAME              AGE
default-headers   4s

Traefik Dashboard

Generate a base64-encoded credential:

sudo apt-get install apache2-utils
htpasswd -nb merox password | openssl base64

Paste the output into dashboard/secret-dashboard.yaml:

1
---
2
apiVersion: v1
3
kind: Secret
4
metadata:
5
  name: traefik-dashboard-auth
6
  namespace: traefik
7
type: Opaque
8
data:
9
  users: abc123==

Point your DNS server to the MetalLB IP from values.yaml:

Set your domain in dashboard/ingress.yaml:

routes:
  - match: Host(`traefik.k3s.your.domain`)

Apply everything from the traefik/dashboard folder:

kubectl apply -f secret-dashboard.yaml
kubectl get secrets --namespace traefik
kubectl apply -f middleware.yaml
kubectl apply -f ingress.yaml

The dashboard will be up but using a self-signed cert. The next section fixes that.

Cert-Manager

From traefik-cert-manager/cert-manager:

helm repo add jetstack https://charts.jetstack.io
helm repo update
kubectl create namespace cert-manager

Note

Check the releases page and use the latest version of cert-manager.

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.17.0/cert-manager.crds.yaml
helm install cert-manager jetstack/cert-manager --namespace cert-manager --values=values.yaml --version v1.17.0

Apply your Cloudflare API secret (use an API Token, not a global key):

kubectl apply -f issuers/secret-cf-token.yaml

Before applying the remaining files, edit:

issuers/letsencrypt-production.yaml: email, dnsZones
certificates/production/your-domain-com.yaml: name, secretName, commonName, dnsNames

kubectl apply -f values.yaml
kubectl apply -f issuers/letsencrypt-production.yaml
kubectl apply -f certificates/production/your-domain-com.yaml

Monitor progress:

kubectl logs -n cert-manager -f cert-manager-(your-instance-name)
kubectl get challenges

Traefik K3S Dashboard

Next Steps

Proceed to the Cluster Management guide.

Cluster Management with Rancher and Longhorn

merox

Feb 11, 2025

#kubernetes #rancher #longhorn #storage

Deploy Rancher for Kubernetes cluster management and Longhorn for distributed block storage

This covers deploying Rancher for cluster management and Longhorn for distributed persistent storage.

Rancher

helm repo add rancher-latest https://releases.rancher.com/server-charts/stable
kubectl create namespace cattle-system

Traefik is already handling ingress, so set tls=external:

helm install rancher rancher-stable/rancher \
  --namespace cattle-system \
  --set hostname=rancher.k3s.your.domain \
  --set tls=external \
  --set replicas=3

Create ingress.yml:

1
apiVersion: traefik.io/v1alpha1
2
kind: IngressRoute
3
metadata:
4
  name: rancher
5
  namespace: cattle-system
6
spec:
7
  entryPoints:
8
    - websecure
9
  routes:
10
    - match: Host(`rancher.k3s.your.domain`)
11
      kind: Rule
12
      services:
13
        - name: rancher
14
          port: 443
15
      middlewares:
16
        - name: default-headers
17
  tls:
18
    secretName: k3s-your-domain-tls

kubectl apply -f ingress.yml

Rancher Dashboard

Longhorn

Install prerequisites on the nodes you want to use for storage:

sudo apt update && sudo apt install -y open-iscsi nfs-common
sudo systemctl enable iscsid
sudo systemctl start iscsid

Label your three worker nodes for HA:

kubectl label node k3s-worker-1 storage.longhorn.io/node=true
kubectl label node k3s-worker-2 storage.longhorn.io/node=true
kubectl label node k3s-worker-3 storage.longhorn.io/node=true

Deploy (this manifest is patched to use the storage.longhorn.io/node=true label):

kubectl apply -f https://raw.githubusercontent.com/meroxdotdev/merox.docs/refs/heads/master/K3S/cluster-deployment/longhorn.yaml

Verify:

kubectl get pods --namespace longhorn-system --watch
kubectl get nodes
kubectl get svc -n longhorn-system

Exposing Longhorn via Traefik

Create middleware.yml:

1
apiVersion: traefik.io/v1alpha1
2
kind: Middleware
3
metadata:
4
  name: longhorn-headers
5
  namespace: longhorn-system
6
spec:
7
  headers:
8
    customRequestHeaders:
9
      X-Forwarded-Proto: "https"

Create ingress.yml:

1
apiVersion: networking.k8s.io/v1
2
kind: Ingress
3
metadata:
4
  name: longhorn-ingress
5
  namespace: longhorn-system
6
  annotations:
7
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
8
    traefik.ingress.kubernetes.io/router.tls: "true"
9
    traefik.ingress.kubernetes.io/router.middlewares: longhorn-system-longhorn-headers@kubernetescrd
10
spec:
11
  rules:
12
  - host: storage.k3s.your.domain
13
    http:
14
      paths:
15
      - path: /
16
        pathType: Prefix
17
        backend:
18
          service:
19
            name: longhorn-frontend
20
            port:
21
              number: 80
22
  tls:
23
  - hosts:
24
    - storage.k3s.your.domain
25
    secretName: k3s-your-domain-tls

Longhorn Storage Dashboard

Next Steps

Proceed to the Advanced Resources guide.

Advanced Resources and Best Practices

merox

Feb 11, 2025

#kubernetes #monitoring #argocd #nfs

Additional tools and resources for monitoring, storage, continuous deployment, and cluster maintenance

NFS Storage

Merox Docs — NFS Storage Guide

Monitoring

Netdata is my go-to for cluster monitoring. You can also deploy Prometheus and Grafana through Rancher, but watch the resource usage — without tuning, Prometheus can get heavy due to query volume.

ArgoCD

ArgoCD docs on merox.dev

Cluster Upgrades

How to Upgrade K3s

Final Thoughts

When I first set up K3s about a year ago, I couldn’t find a single source that covered everything needed for even a basic homelab cluster. That’s why I wrote this. If something’s missing or unclear, leave a comment and I’ll update it.

DNS and IP Addressing

VM Deployment with Cloud-Init

Guide Structure

Prerequisites

Configuration

Deploy

Next Steps

Deploying Traefik

Traefik Dashboard

Cert-Manager

Next Steps

Rancher

Longhorn

Exposing Longhorn via Traefik

Next Steps

NFS Storage

Monitoring

ArgoCD

Cluster Upgrades

Final Thoughts

Special Thanks