İçeriğe Atla
Mustafa Erbay
Technology kubernetes-uretim-guvenlik · 13 min read · görüntülenme Türkçe oku
100%

Kubernetes Etcd Encryption at Rest + KMS Design

Protecting Secrets with real cryptography rather than just base64: encryption configuration, KMS integration, and an operational rotation model.

Kubernetes Etcd Encryption at Rest + KMS Design — cover image

One of the most misunderstood concepts on the Kubernetes side is the Secret object. A Secret is not an encrypted vault; in most installations it’s just base64-encoded data. If that data sits in ETCD essentially in plaintext (other than the encoding wrapper), then a compromised control plane or a leaked ETCD snapshot becomes a much bigger problem than it needs to be.

In this article I have two goals:

  1. Use Encryption at Rest to actually encrypt sensitive data sitting in ETCD
  2. Make that encryption operationally viable through KMS, instead of dropping a key file on disk

The problem: where do Secrets live, and who can read them?

The risk surface usually opens up through three channels:

  • ETCD access (disk, snapshot, backup)
  • Control-plane node compromise (encryption config plus access to certs/keys)
  • Backup chain (S3 buckets, repositories, backup agents)

Encryption at rest doesn’t make the first two risks vanish; what it does is take the ETCD data itself out of the “directly readable” category.

Kubernetes Encryption at Rest: the logic, briefly

Kubernetes can encrypt certain resource types (such as secrets and configmaps) before writing them to ETCD. The core building blocks are:

  • The EncryptionConfiguration file
  • The API Server applying encryption on the write path based on that file

A critical fact in this model:

  • Encryption happens at write time.
  • Objects that were already written can stay in the old format until a separate process re-writes them.

Why KMS is non-negotiable

Static keys (an AES key sitting in a file) are quick in the short run but weak for enterprise operations:

  • Key rotation is painful
  • Access control and audit are weak
  • “Who used the key, and when?” has no clean answer

The goal of bringing in KMS:

  • Centrally manage the key lifecycle
  • Audit key usage
  • Bring rotation into a regular “planned maintenance” rhythm

Design: how do you make the KMS integration “operational”?

1) Highly available KMS endpoint

If KMS is unreachable, the API Server’s write path takes the hit. So:

  • Plan for at least two endpoints (or an HA service)
  • Measure timeout and retry behavior
  • Tie KMS maintenance windows to the platform’s maintenance calendar

2) Decide on the failure mode up front

There are typically three approaches:

  • Fail-closed: no KMS, no writes (high security, high operational risk)
  • Fail-open: write unencrypted when KMS is gone (easy operations, high risk)
  • Controlled degradation: fail-closed only on selected resources

The most realistic model for enterprise practice:

  • Fail-closed for secrets (HA and a runbook are mandatory)
  • Controlled tolerance for lower-risk resources

3) Key rotation: “planned and measured”

Rotation goals:

  • New writes are encrypted with the new key
  • Older data is safely re-encrypted over time

Operational suggestions:

  1. Before rotation: check the trend of API latency and error budget
  2. Canary: change the key order on a single cluster or segment first
  3. Spread: gradual restart, controlled rollout
  4. Post-rotation: handle re-encryption “in installments”

Minimum viable runbook: when KMS misbehaves

Symptoms:

  • API Server 5xx / write timeouts
  • secrets create/update problems

Triage questions:

  1. Is the KMS endpoint reachable? (network, DNS, mTLS)
  2. Has KMS latency spiked? (throttling, quota)
  3. Is there a KMS plugin error in the API Server logs?

Initial response:

  • Bring the KMS endpoint back to health (the cleanest answer)
  • If the incident is escalating and the risk is acceptable: follow the pre-written break-glass plan to temporarily simplify the encryption config

Observation: don’t go blind just because there’s encryption

Signals to watch for this design to work properly:

  • API server latency (especially the write path)
  • KMS request rate, latency, and error ratio
  • “Is the data actually encrypted?” expectation along the ETCD backup/snapshot pipeline
  • Key rotation dates and access audit records

Conclusion

Encryption at rest in Kubernetes is not a “compliance checkbox”; it’s a serious architectural decision that reduces control-plane risk. But it’s not enough by itself: without KMS availability, a rotation rhythm, failover, and a break-glass runbook, that security gain turns into operational fragility. Done properly, the design makes security part of the platform’s behavior, with no surprises during an incident.

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts