Kubernetes Control Plane Certificate Expiry: A Runbook

In Kubernetes, certificate expiry is rarely “gradual degradation”; usually it’s an incident where everyone hits the wall at once: kubectl stops working, controllers can’t talk to the API, nodes can’t update status, and the system looks like it has “completely collapsed.”

This post is a practical runbook for the certificate-expiry scenario specifically on kubeadm-based (self-managed) clusters. The approach is different for managed Kubernetes (EKS/AKS/GKE).

Symptom set (quick triage)

The most frequent errors:

x509: certificate has expired or is not yet valid
Unable to connect to the server: x509: certificate signed by unknown authority
Unauthorized (the certificate was renewed but the kubeconfig is stale)

During incident triage, answer these two questions fast:

Is the error on the client side (kubeconfig / local) or on the cluster side?
Does the error affect a single component or the control plane in general?

Scope: Which setups is this runbook for?

Clusters built with kubeadm
You have SSH access to the control-plane nodes
Etcd access is on the same node, or separate but manageable

If you’re on managed Kubernetes: certificate renewal is generally handled by the provider; use this runbook for the “kubeconfig / client cert” parts.

Step 0 — Change management (even in incident mode)

Two panic-driven mistakes are very expensive in this incident:

“I tried a few things” and ended up running different operations on different nodes (inconsistency)
Renewing certificates on multiple nodes simultaneously (amplifies split-brain risk)

So:

A single Incident Commander + a single operator
Every command run gets a note in the decision log

Step 1 — Check the certificate state (kubeadm)

On the control-plane node:

sudo kubeadm certs check-expiration

If the EXPIRES field in the output is in the past, that’s most likely your problem. Even if only certain certificates have expired, do the renewal in a controlled way rather than “piece by piece.”

Step 2 — Renew the certificates

The most practical path on kubeadm:

sudo kubeadm certs renew all

In some setups the admin kubeconfig is renewed too. Still, validate the kubeconfig at the final step.

Step 3 — Restart the control-plane components safely

In most kubeadm setups, control-plane components run as static pods. Usually kubelet picks up the manifest change and recreates the pods; but after a certificate renewal this step practically helps:

sudo systemctl restart kubelet

Then:

sudo crictl ps | rg -n \"kube-apiserver|kube-controller-manager|kube-scheduler\"
sudo crictl logs $(sudo crictl ps -q --name kube-apiserver) | tail -n 40

The goal: confirm the API server has come back healthy.

Step 4 — Validate the kubeconfig (client side)

If you’re still seeing x509 from kubectl:

Check whether the kubeconfig you’re using is up to date
Re-fetch / copy the admin config

The typical path on a control-plane node:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl get nodes

Step 5 — Confirm etcd and the controllers have recovered

Don’t close out this incident just because “kubectl works.” Verify:

Are the nodes Ready?
Is anything in kube-system in crashloop?
Are controllers emitting new events?

kubectl get nodes -o wide
kubectl -n kube-system get pods
kubectl get events --sort-by=.lastTimestamp | tail -n 30

Preventive controls (don’t let this incident repeat)

The most effective prevention is producing an alert before the certificate expires.

Collect the kubeadm certs check-expiration output via a daily job
EXPIRES < 30 days -> warning, < 7 days -> critical
Treat time synchronization (NTP) as a “critical service” for the control plane

Final word

When a control-plane certificate expires, the goal isn’t to “find the kubeadm command”; it’s to run a controlled, single-handed, validation-driven recovery. With a good runbook plus early warning, this incident can be managed as a planned 15-30 minute maintenance window rather than an outage stretching into hours.

Kubernetes Control Plane Certificate Expiry: A Runbook

Symptom set (quick triage)

Scope: Which setups is this runbook for?

Step 0 — Change management (even in incident mode)

Step 1 — Check the certificate state (kubeadm)

Step 2 — Renew the certificates

Step 3 — Restart the control-plane components safely

Step 4 — Validate the kubeconfig (client side)

Step 5 — Confirm etcd and the controllers have recovered

Preventive controls (don’t let this incident repeat)

Final word

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Kubernetes RBAC: Least Privilege + Break-Glass Model

Kubernetes Admission Webhook Timeouts: A Runbook for Frozen Deploys

Protecting the Kubernetes Control Plane with API Priority and Fairness

Symptom set (quick triage)

Scope: Which setups is this runbook for?

Step 0 — Change management (even in incident mode)

Step 1 — Check the certificate state (kubeadm)

Step 2 — Renew the certificates

Step 3 — Restart the control-plane components safely

Step 4 — Validate the kubeconfig (client side)

Step 5 — Confirm etcd and the controllers have recovered

Preventive controls (don’t let this incident repeat)

Final word

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Kubernetes RBAC: Least Privilege + Break-Glass Model

Kubernetes Admission Webhook Timeouts: A Runbook for Frozen Deploys

Protecting the Kubernetes Control Plane with API Priority and Fairness

Klavye Kısayolları