Traefik High-Availability HTTPS wildcard certificate with Cert-Manager

May 15, 2022

You have probably deployed Traefik as your Ingress Controller, and it gives HTTPS but you still need to hardcode certificates? And when the Traefik pod restarts all your apps are unreachable? Then this post is for you :-)

I will use GitOps principles and FluxCD to deploy infrastructure components but feel free to adapt it to your need.

Traefik

Let's deploy a simple Traefik application with 2 replicas. First the helm repository:

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: HelmRepository
metadata:
  name: traefik
  namespace: flux-system
spec:
  interval: 15m
  url: https://helm.traefik.io/traefik

And second the Helm chart:

---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: traefik
  namespace: flux-system
spec:
  chart:
    spec:
      chart: traefik
      sourceRef:
        kind: HelmRepository
        name: traefik
      version: 10.19.5
  interval: 15m 
  releaseName: traefik
  targetNamespace: default
  # https://github.com/traefik/traefik-helm-chart/blob/master/traefik/values.yaml
  values:
    ports:
      web:
        # (optional) Permanent Redirect to HTTPS
        redirectTo: websecure
      websecure:
        tls:
          enabled: true
    
    # # only for debugging
    # logs:
    #   general:
    #     level: DEBUG

    # Using affinity from Traefik default values example
    # This pod anti-affinity forces the scheduler to put traefik pods
    # on nodes where no other traefik pods are scheduled.
    # It should be used when hostNetwork: true to prevent port conflicts
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - topologyKey: topology.kubernetes.io/region
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values: 
              - traefik
    
    ingressRoute:
      dashboard:
        enabled: false
    
    ### Disable as cert-manager handles certs
    persistence:
      enabled: false

    deployment:
      replicas: 2
    podDisruptionBudget:
      enabled: true
      minAvailable: 1

    # Set Traefik as your default Ingress Controller, according to Kubernetes 1.19x changes
    ingressClass:
      enabled: true
      isDefaultClass: true

It should give you 2 Traefik replicas without any handling of HTTPS and wildcard.

Now let's take care of the SSL.

Cert-manager

This tool is gold: https://cert-manager.io/

It is a certificate management tool for Kubernetes. Having several Traefik replicas can create conflicts with certificates, that's why the feature it is not supported (outside of the enterprise edition $$$). Cert-manager will manage everything for us. Let's install it first with FluxCD:

First the repository:

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: HelmRepository
metadata:
  name: jetstack
  namespace: flux-system
spec:
  interval: 1h
  url: https://charts.jetstack.io

And then the Helm chart:

---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: cert-manager
  namespace: flux-system
spec:
  chart:
    spec:
      chart: cert-manager
      sourceRef:
        kind: HelmRepository
        name: jetstack
      version: v1.8.0
  interval: 15m
  releaseName: cert-manager
  targetNamespace: cert-manager
  install:
    createNamespace: true
  values:
    installCRDs: true

Now we need to deal with the SSL and the wildcards. And as you may know, you need a secret from your cloud provider for the DNS01 challenge stuff. So either you create it manually and apply it. Or you apply GitOps principles, encrypt it with SOPS or Sealed Secrets and store it in Git (spoil: that's what I use).

My cloud provider being DigitalOcean, I create a digitalocean-dns secret inside the cert-manager namespace. Then I reference this secret inside my ClusterIssuer:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
  namespace: cert-manager
spec:
  acme:
    email: tech@remazing.eu
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - dns01:
        digitalocean:
          tokenSecretRef:
            name: digitalocean-dns
            key: access-token

In case you are testing, you may prefer to use the acme staging server:

---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
  namespace: cert-manager
spec:
  acme:
    email: tech@remazing.eu
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: issuer-letsencrypt-staging
    solvers:
    - dns01:
        digitalocean:
          tokenSecretRef:
            name: digitalocean-dns
            key: access-token

Now that we have our ClusterIssuer, we can finish with a wildcard certificate used to access any resources on the cluster:

---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-cert
  namespace: default
spec:
  secretName: wildcard-yourdomain-you
  issuerRef:
    kind: ClusterIssuer
    name: letsencrypt-prod
  commonName: '*.yourdomain.you'
  dnsNames:
    - '*.yourdomain.you'

Bingo! Now we obtain under a few minutes our wildcard certificate. The last step is to indicate Traefik how to use our wildcard certificate as a default certificate for any HTTPS name matching. Well, there is a CRD for it called TLSStore, and it is linked to the certificate secret.

apiVersion: traefik.containo.us/v1alpha1
kind: TLSStore
metadata:
  name: default
  namespace: default
spec:
  defaultCertificate:
    secretName: wildcard-yourdomain-you

Annnd Voila! Now your applications should be reachable under *.yourdomain.you when you create an IngressRoute, without having to specify any TLS secret.

Bonus

Now that you have a basic wildcard, you can easily customise your SLL endpoints by creating a Certificate along your IngressRoute. Example:

---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: loki-ingress
spec:
  entryPoints:
    - websecure
  routes:
  - match: Host(`loki.monitoring.yourdomain.you`)
    kind: Rule
    services:
      - name: default-loki-stack
        port: 3100
  tls:
    secretName: certificate-loki-secret
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: loki.monitoring.yourdomain.you
spec:
  secretName: certificate-loki-secret
  dnsNames:
  - loki.monitoring.yourdomain.you
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer

Notice how we use the same secret created by the Certificate Custom Resource, and then specify it in the TLS section of our IngressRoute.