Mapping Risk with Pre-mortems Before a Change

No matter how good a change plan looks on paper, things rarely go “as planned” in production. The real problem isn’t the failures that show up after a change; it’s how fast and how correctly the team responds when one does. That’s why on critical changes I value the “pre-mortem” discipline just as much as the “postmortem.”

The aim of a pre-mortem is simple: assume “the change failed,” live through the failure and its impact in your head ahead of time, and make the risks visible.

When should you run a pre-mortem?

Holding a meeting for every deploy isn’t sustainable. Use pre-mortems for changes like these:

Wide blast surface: shared layers like platform, network, identity, logging, DNS
Hard to roll back: data schema, stateful service, security policy
First-time work: a new technology/vendor/operating model
Touches production access or permission changes

A practical 30-minute flow

I try to fit a pre-mortem into 30 minutes:

Goal (3 min): What business outcome is this change trying to deliver?
“It failed” scenario (10 min): Pick the 3 worst-case ways this ends badly.
Early signals (7 min): Which metric/log/alert catches it first?
Rollback (7 min): 1) automatic 2) manual 3) “stop and isolate”
Decision points (3 min): At what threshold is rollback mandatory?

Template: the same questions for every change

I keep these questions fixed:

Blast radius: What’s the worst-case impact area?
Dependencies: Which service/layer gets quietly affected?
Observability: Which signal shows success and which shows breakage?
Authority: Who can roll back? Is there a break-glass?
Data: Is there a data consistency risk on rollback?
Timing: Are clock skew / TTL / cache effects in play during the change?
Communication: Is it clear who gets notified on which channel?

These questions exist to speed up decisions, not to “produce documents.”

How should you use the pre-mortem output?

The best outputs:

A “risk and rollback” section added to the change RFC
Decision points added to the runbook (threshold + action)
Closing observability/alarm gaps (before deploy)

The worst output: holding the meeting and never writing anything down.

Leadership angle: a pre-mortem is a trust-building exercise

A pre-mortem isn’t a “I don’t trust the team” message; on the contrary, it’s how you produce safe speed without piling pressure on the team. A good leader doesn’t hide risks; they make them visible. Because real speed in production is the ability to catch failure early and roll back correctly.

Closing

There’s no perfect plan in production; only a well-prepared rollback. A pre-mortem is a small investment before the change and a big time saver after it. Done with discipline, the team isn’t “brave” — it’s in control.

Frequently Asked Questions

Common questions readers have about this article.

How do I decide which changes require a pre-mortem analysis?

I use a simple set of criteria to determine which changes require a pre-mortem analysis. If a change has a wide blast surface, is hard to roll back, involves first-time work, or touches production access or permission changes, I consider it a good candidate for a pre-mortem. This helps me focus on the most critical changes that could have a significant impact on our systems and users.

What is the ideal duration for a pre-mortem meeting, and how can I structure it?

In my experience, a 30-minute pre-mortem meeting is sufficient to cover the essential topics. I allocate 3 minutes to discuss the goal of the change, 10 minutes to brainstorm the worst-case scenarios, 7 minutes to identify early signals of failure, 7 minutes to discuss rollback strategies, and 3 minutes to determine decision points for mandatory rollback. This structure helps me stay focused and ensure that all critical aspects are covered.

What are some common pitfalls to avoid when conducting a pre-mortem analysis?

One common pitfall I've noticed is that teams tend to dodge edge cases or uncomfortable possibilities. To avoid this, I make sure to write down 'things that won't happen' too, as this helps to put all possibilities on the table. Another pitfall is not having a clear template or set of questions to guide the discussion. I use a fixed set of questions, such as 'What's the worst-case impact area?' and 'What are the dependencies?', to ensure that we cover all critical aspects.

How many attempts or iterations are typically needed to get a pre-mortem process up and running effectively?

In my experience, it takes around 2-3 attempts to get a pre-mortem process up and running effectively. The first attempt often helps to identify the key areas of focus, while the second attempt refines the process and ensures that all critical aspects are covered. By the third attempt, the team is usually able to conduct a pre-mortem analysis efficiently and effectively, and we start to see the benefits of this discipline in our change management process.

Mapping Risk with Pre-mortems Before a Change

When should you run a pre-mortem?

A practical 30-minute flow

Template: the same questions for every change

How should you use the pre-mortem output?

Leadership angle: a pre-mortem is a trust-building exercise

Closing

Frequently Asked Questions

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Balancing Operational Confidence and Speed with DORA Metrics

An Exit Plan for Vendor Lock-in: Technical + Operational Contract

The Decision Log and Handoff Discipline During Incident Rotation

When should you run a pre-mortem?

A practical 30-minute flow

Template: the same questions for every change

How should you use the pre-mortem output?

Leadership angle: a pre-mortem is a trust-building exercise

Closing

Frequently Asked Questions

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Balancing Operational Confidence and Speed with DORA Metrics

An Exit Plan for Vendor Lock-in: Technical + Operational Contract

The Decision Log and Handoff Discipline During Incident Rotation

Klavye Kısayolları