İçeriğe Atla
Mustafa Erbay
Technology veritabani-derinlemesine · 13 min read · görüntülenme Türkçe oku
100%

Online Schema Migration: Expand/Contract, Backfill, and Dual Write

An expand/contract approach for schema changes without downtime, plus backfill strategy, dual-write risks, and a rollback plan.

Online Schema Migration: Expand/Contract, Backfill, and Dual Write — cover image

Database schema changes (adding/removing a column, indexes, constraints, type conversions) are often viewed by teams as “run the DDL and move on.” At enterprise scale, however, a schema change turns into an incident the moment it isn’t considered together with deploy, traffic, replication, lock behavior, and rollback.

This post describes a framework that has worked many times in production for teams that say “I don’t want downtime but I want to make a safe change”: expand/contract, backfill, and where needed, dual write.

The goal: not “zero downtime” but “controlled risk”

My realistic objective is this:

  • No user-visible outage (or at most, a very brief and planned one)
  • The change is reversible
  • Signals and guardrails are defined

What is expand/contract?

The expand/contract approach makes a schema change in two (sometimes three) phases instead of a single step.

  1. Expand: add the new field, don’t break the old one
  2. Migrate: move the data, switch the application to the new field
  3. Contract: remove the old field / tighten the constraint

The strength of this approach is here:

  • The application and the DB don’t make a “sharp pivot” at the same time.
  • Rollback becomes possible: you can move the application back to the old field.

1) Expand phase: add the new field, don’t break

Typical steps:

  • Add the new column (one that accepts NULL)
  • Create the new index using a “non-blocking” method (depends on the DB engine)
  • Add the new table/relation (without breaking the old path)

The rule of this phase: the previous application version must keep working.

Risk points:

  • DDLs like “DEFAULT + NOT NULL” can take long locks on some databases
  • Creating an index on a large table can push replication behind

2) Backfill: moving data isn’t an ETL

Backfill is often the real cost. I’ve seen these mistakes a lot in the field:

  • Locking the table with a “single-shot update”
  • Blowing up replication lag during backfill
  • Not monitoring the backfill (how much is left, what rate, what error rate?)

My practical recommendation:

  • Do backfill in batches (e.g. 1k–10k rows)
  • Short pause between batches (let the DB breathe)
  • Persist progress idempotently (checkpoint)
  • Be able to resume on failure (resume)

Sample “batch backfill” logic (pseudocode):

-- son_islenen_id'yi bir yerde tut
UPDATE my_table
SET new_col = transform(old_col)
WHERE id > :last_id
ORDER BY id
LIMIT :batch_size;

3) Dual write: a last resort, but sometimes necessary

Dual write is when the application writes both to the old and new field for a period. It’s powerful but risky:

  • Writing to two places = two different failure modes
  • Partial write (wrote to one, couldn’t write to the other)
  • Inconsistencies via ordering and retry

If you’re going to use dual write:

  • Define the “source of truth” (which one is authoritative?)
  • Produce an inconsistency-detection signal (reconciliation job)
  • Keep the dual-write window short (it shouldn’t last weeks)

Contract phase: produce “evidence” before removing the old

Before moving to the contract step, I look for these pieces of evidence:

  • Usage rate of the new column/table is 99%+ (read path)
  • The old field is no longer read (verified with logs/metrics)
  • The reconciliation job reports 0 mismatches
  • The rollback plan was reviewed one last time

Examples of contract:

  • Drop the old column
  • Add the NOT NULL constraint
  • Tighten the unique constraint

Observability: what to monitor during the migration

My minimum monitoring list:

  • DB: CPU, I/O, lock waits, slow queries
  • Replication lag (if applicable)
  • Application: error rate, latency (especially the write path)
  • Backfill job: batch/s, remaining rows, failures

Alerting principle:

  • Temporarily set alert thresholds for metrics that are “normally quiet” during the migration
  • Define a trigger threshold for rollback

Rollback plan: not on paper, step by step

A rollback plan should include:

  • Application: returning to the previous version (feature flag / config)
  • DB: the new column can stay (rollback in the expand phase usually isn’t “drop the schema”)
  • Data: will the backfill be reverted, or left in place?

The approach I use often:

  • In the expand phase, rollback = revert the application, leave the schema
  • Before the contract phase, rollback = don’t do the contract, keep living with the debt

Conclusion

Online schema migration isn’t the sum of DDL commands; it’s an operational process. Reduce risk with expand/contract, manage backfill with a capacity budget, use dual write briefly and under control, and close the debt with contract. Most importantly: measure every phase, prove it, and make rollback “actually executable.”

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts