Steps to Set Up a Hetzner Kubernetes Cluster:
This guide sets up a resilient, production-grade Kubernetes cluster with HA, autoscaling, automatic upgrades and an ingress stack.
Create a Read-Write Hetzner API Token
- Go to your Hetzner account and create a token with Read-Write access to manage resources.
Generate an SSH Key Pair
- Run the following command to create a new SSH key pair:bash
ssh-keygen -t ed25519 - Follow the prompts to save the key.
Bootstrap the kube.tf File
- Use this script to create and configure the
kube.tfTerraform file:bashtmp_script=$(mktemp) && curl -sSL -o "${tmp_script}" https://raw.githubusercontent.com/kube-hetzner/terraform-hcloud-kube-hetzner/master/scripts/create.sh && chmod +x "${tmp_script}" && "${tmp_script}" && rm "${tmp_script}"
Modify the kube.tf File
- Open the generated
kube.tfand adjust the configurations according to your project's needs (e.g., node sizes, region, etc.). Below is an example configuration for a highly-available (HA) Kubernetes setup.
Example kube.tf for HA Setup
This configuration defines an HA cluster on Hetzner, using multiple control planes in different data centers for redundancy.
locals {
hcloud_token = ""
}
module "kube-hetzner" {
providers = {
hcloud = hcloud
}
hcloud_token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
source = "kube-hetzner/kube-hetzner/hcloud"
ssh_public_key = file("id_ed25519.pub")
ssh_private_key = file("id_ed25519")
network_region = "eu-central"
control_plane_nodepools = [
{
name = "control-plane-fsn1",
server_type = "cx32",
location = "fsn1",
labels = [],
taints = [],
count = 1
},
{
name = "control-plane-nbg1",
server_type = "cx32",
location = "nbg1",
labels = [],
taints = [],
count = 1
},
{
name = "control-plane-hel1",
server_type = "cx32",
location = "hel1",
labels = [],
taints = [],
count = 1
}
]
agent_nodepools = [
{
name = "agent-fsn1",
server_type = "cx32",
location = "fsn1",
labels = [],
taints = [],
count = 1
},
]
autoscaler_nodepools = [
{
name = "autoscaled-cx32",
server_type = "cx32",
location = "nbg1",
min_nodes = 1,
max_nodes = 3
}
]
load_balancer_type = "lb11"
load_balancer_location = "fsn1"
control_plane_lb_type = "lb11"
use_control_plane_lb = true
system_upgrade_use_drain = true
cluster_name = "emporium"
cni_plugin = "cilium"
cilium_version = "1.15.1"
cilium_routing_mode = "native"
cilium_hubble_enabled = true
disable_kube_proxy = true
ingress_controller = "nginx"
enable_cert_manager = true
cert_manager_values = <<EOT
installCRDs: true
nameOverride: cert-manager
fullnameOverride: cert-manager
ingressShim:
defaultIssuerKind: ClusterIssuer
defaultIssuerName: letsencrypt-prod
EOT
automatically_upgrade_k3s = true
automatically_upgrade_os = true
dns_servers = [
"1.1.1.1",
"8.8.8.8",
"2606:4700:4700::1111",
]
}
provider "hcloud" {
token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
}
terraform {
required_version = ">= 1.5.0"
required_providers {
hcloud = {
source = "hetznercloud/hcloud"
version = ">= 1.43.0"
}
}
}
output "kubeconfig" {
value = module.kube-hetzner.kubeconfig
sensitive = true
}
variable "hcloud_token" {
sensitive = true
default = ""
}Explanation:
Control Plane Setup:
Thecontrol_plane_nodepoolsdefine three control-plane nodes in different Hetzner regions (fsn1, nbg1, hel1), providing redundancy and high availability. If one data center goes down, the other control planes can keep the cluster running.Autoscaler:
Theautoscaler_nodepoolsallows the worker nodes to scale automatically based on demand. The configuration allows for a minimum of 1 node and a maximum of 3 nodes in thefsn1region.Load Balancer:
Two lb11 type Hetzner load balancers are deployed in the fsn1 region:- One distributes traffic across the control plane nodes.
- The other manages traffic for the ingress nodes.
CNI and Ingress Controller:
The setup usesCiliumas the CNI (Container Networking Interface) plugin, a powerful and scalable networking solution. Thenginxingress controller is enabled to handle external traffic, and Cert-Manager is set up to manage TLS certificates for secure communication.Automatic Upgrades:
Both the Kubernetes components (K3s) and the underlying OS will be automatically upgraded to ensure security and stability.DNS Servers:
The cluster is configured to use Cloudflare's and Google's public DNS servers, ensuring reliable domain resolution.
Initialize and Apply Terraform
- Initialize your Terraform workspace and apply the changes:bash
terraform init --upgrade terraform validate terraform apply -auto-approve
Wait for Cluster Bootstrap
- After a few minutes, all components should be fully deployed in your Hetzner project. The
kubeconfigfile will be generated in the same directory askube.tf.
External DNS
To enable automatic DNS resolution when creating an Ingress in your Kubernetes cluster, you need to install External-DNS and configure Cert-Manager's ClusterIssuer. This guide walks you through setting up External-DNS with Azure DNS and configuring Cert-Manager to use Let's Encrypt for SSL certificates.
Create a DNS Zone in Azure
Replace mydomain.com with your actual domain.
AZURE_DNS_ZONE_RESOURCE_GROUP="dns-zones"
AZURE_DNS_ZONE="mydomain.com"
az network dns zone create --resource-group $AZURE_DNS_ZONE_RESOURCE_GROUP --name $AZURE_DNS_ZONECreate a Service Principal (SP) with DNS Zone Contributor Role
AZURE_CERT_MANAGER_NEW_SP_NAME="dns-zones-manager"
AZURE_DNS_ZONE_RESOURCE_GROUP="dns-zones"
AZURE_DNS_ZONE="mydomain.com"
DNS_SP=$(az ad sp create-for-rbac --name $AZURE_CERT_MANAGER_NEW_SP_NAME --output json)
AZURE_CERT_MANAGER_SP_APP_ID=$(echo $DNS_SP | jq -r '.appId')
AZURE_CERT_MANAGER_SP_PASSWORD=$(echo $DNS_SP | jq -r '.password')
AZURE_TENANT_ID=$(echo $DNS_SP | jq -r '.tenant')
AZURE_SUBSCRIPTION_ID=$(az account show --output json | jq -r '.id')
DNS_ID=$(az network dns zone show --name "mydomain.com" --resource-group "dns-zones" --query "id" --output tsv)
az role assignment create --assignee ${AZURE_CERT_MANAGER_SP_APP_ID} --role "DNS Zone Contributor" --scope $DNS_IDCreate Kubernetes Secret for Azure DNS
kubectl create secret generic azuredns-config --from-literal=client-secret=${AZURE_CERT_MANAGER_SP_PASSWORD}Create a values.yaml File
nameOverride: external-dns
fullnameOverride: external-dns
crd:
create: true
policy: sync
sources:
- crd
- service
- ingress
resources:
requests:
cpu: 50m
memory: 64Mi
extraArgs:
txt-wildcard-replacement: wildcard
service:
enabled: false
networkPolicy:
enabled: false
domainFilters:
- mydomain.com
provider: azure
azure:
tenantId: "<AZURE_TENANT_ID>"
subscriptionId: "<AZURE_SUBSCRIPTION_ID>"
resourceGroup: "dns-zones"
aadClientId: "<AZURE_CERT_MANAGER_SP_APP_ID>"
aadClientSecret: "<AZURE_CERT_MANAGER_SP_PASSWORD>"Install External-DNS
helm install external-dns bitnami/external-dns \
--namespace external-dns \
-f values.yamlCreate ClusterIssuer
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
email: your-email@example.com # Replace with your email
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- dns01:
azureDNS:
tenantID: "<AZURE_TENANT_ID>"
subscriptionID: "<AZURE_SUBSCRIPTION_ID>"
resourceGroupName: "dns-zones"
hostedZoneName: "mydomain.com"
clientID: "<AZURE_CERT_MANAGER_SP_APP_ID>"
clientSecretSecretRef:
name: azuredns-config
key: client-secret
environment: AzurePublicCloud
selector:
dnsZones:
- "mydomain.com"kubectl apply -f clusterissuer.yamlCaveats
Because of an existing limitation in upstream Kubernetes, pods cannot talk to other pods via the IP address of an external load-balancer set up through a LoadBalancer-typed service. Kubernetes will cause the LB to be bypassed, potentially breaking workflows that expect TLS termination or proxy protocol handling to be applied consistently.
To resolve this issue, add the following annotation to the Ingress NGINX service:
load-balancer.hetzner.cloud/hostname: lb.mydomain.comEnsure you manually create an A record for the lb entry within your DNS zone mydomain.com. Failing to do so may cause the external-dns and ingress integration to malfunction.