İçeriğe Atla
Mustafa Erbay
Career · 7 min read · görüntülenme Türkçe oku
100%

Mapping Risk with Pre-mortems Before a Change

Living through the failure in your head before going to production: pre-mortem cadence, a template, decision points, and operational leadership in practice.

Mapping Risk with Pre-mortems Before a Change — cover image

No matter how good a change plan looks on paper, things rarely go “as planned” in production. The real problem isn’t the failures that show up after a change; it’s how fast and how correctly the team responds when one does. That’s why on critical changes I value the “pre-mortem” discipline just as much as the “postmortem.”

The aim of a pre-mortem is simple: assume “the change failed,” live through the failure and its impact in your head ahead of time, and make the risks visible.

When should you run a pre-mortem?

Holding a meeting for every deploy isn’t sustainable. Use pre-mortems for changes like these:

  • Wide blast surface: shared layers like platform, network, identity, logging, DNS
  • Hard to roll back: data schema, stateful service, security policy
  • First-time work: a new technology/vendor/operating model
  • Touches production access or permission changes

A practical 30-minute flow

I try to fit a pre-mortem into 30 minutes:

  1. Goal (3 min): What business outcome is this change trying to deliver?
  2. “It failed” scenario (10 min): Pick the 3 worst-case ways this ends badly.
  3. Early signals (7 min): Which metric/log/alert catches it first?
  4. Rollback (7 min): 1) automatic 2) manual 3) “stop and isolate”
  5. Decision points (3 min): At what threshold is rollback mandatory?

Template: the same questions for every change

I keep these questions fixed:

  • Blast radius: What’s the worst-case impact area?
  • Dependencies: Which service/layer gets quietly affected?
  • Observability: Which signal shows success and which shows breakage?
  • Authority: Who can roll back? Is there a break-glass?
  • Data: Is there a data consistency risk on rollback?
  • Timing: Are clock skew / TTL / cache effects in play during the change?
  • Communication: Is it clear who gets notified on which channel?

These questions exist to speed up decisions, not to “produce documents.”

How should you use the pre-mortem output?

The best outputs:

  • A “risk and rollback” section added to the change RFC
  • Decision points added to the runbook (threshold + action)
  • Closing observability/alarm gaps (before deploy)

The worst output: holding the meeting and never writing anything down.

Leadership angle: a pre-mortem is a trust-building exercise

A pre-mortem isn’t a “I don’t trust the team” message; on the contrary, it’s how you produce safe speed without piling pressure on the team. A good leader doesn’t hide risks; they make them visible. Because real speed in production is the ability to catch failure early and roll back correctly.

Closing

There’s no perfect plan in production; only a well-prepared rollback. A pre-mortem is a small investment before the change and a big time saver after it. Done with discipline, the team isn’t “brave” — it’s in control.

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

Frequently Asked Questions

Common questions readers have about this article.

How do I decide which changes require a pre-mortem analysis?
I use a simple set of criteria to determine which changes require a pre-mortem analysis. If a change has a wide blast surface, is hard to roll back, involves first-time work, or touches production access or permission changes, I consider it a good candidate for a pre-mortem. This helps me focus on the most critical changes that could have a significant impact on our systems and users.
What is the ideal duration for a pre-mortem meeting, and how can I structure it?
In my experience, a 30-minute pre-mortem meeting is sufficient to cover the essential topics. I allocate 3 minutes to discuss the goal of the change, 10 minutes to brainstorm the worst-case scenarios, 7 minutes to identify early signals of failure, 7 minutes to discuss rollback strategies, and 3 minutes to determine decision points for mandatory rollback. This structure helps me stay focused and ensure that all critical aspects are covered.
What are some common pitfalls to avoid when conducting a pre-mortem analysis?
One common pitfall I've noticed is that teams tend to dodge edge cases or uncomfortable possibilities. To avoid this, I make sure to write down 'things that won't happen' too, as this helps to put all possibilities on the table. Another pitfall is not having a clear template or set of questions to guide the discussion. I use a fixed set of questions, such as 'What's the worst-case impact area?' and 'What are the dependencies?', to ensure that we cover all critical aspects.
How many attempts or iterations are typically needed to get a pre-mortem process up and running effectively?
In my experience, it takes around 2-3 attempts to get a pre-mortem process up and running effectively. The first attempt often helps to identify the key areas of focus, while the second attempt refines the process and ensures that all critical aspects are covered. By the third attempt, the team is usually able to conduct a pre-mortem analysis efficiently and effectively, and we start to see the benefits of this discipline in our change management process.
ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts