İçeriğe Atla
Mustafa Erbay
Tutorials · 7 min read · görüntülenme Türkçe oku
100%

Cgroup v2 Memory Pressure Runbook with systemd-oomd

PSI, systemd-oomd policy, testing, and recovery steps to catch a node OOM crisis early and evict workloads in a controlled way.

Cgroup v2 Memory Pressure Runbook with systemd-oomd — cover image

In production, the sentence “OOM happened, the process died” is a result; the real problem is usually the memory pressure that started before the OOM. With the wrong reflex (e.g. “open swap, run drop_caches”), you save the day, but the same incident comes back in a different form.

This article presents a field-focused runbook that catches memory pressure early and evicts in a controlled way using the Cgroup v2 + PSI (Pressure Stall Information) + systemd-oomd trio on Linux.

What are we trying to solve?

The goal is to reduce these three problems at the same time:

  1. Randomness of the kernel OOM killer: the most critical process being chosen at the worst possible moment.
  2. Cascading collapse: memory pressure → latency increase → retry storm → more memory.
  3. Operational blindness: answering “why did it happen?” with intuition rather than evidence.

Prerequisites (checklist)

  • The kernel and distribution must be running with Cgroup v2.
  • The systemd version must include systemd-oomd (most modern distros do).
  • PSI metrics must be readable.

Quick verification:

# cgroup v2 mi?
stat -fc %T /sys/fs/cgroup

# PSI dosyaları var mı?
ls /proc/pressure/

# oomd çalışıyor mu?
systemctl status systemd-oomd

Expected:

  • cgroup2fs
  • /proc/pressure/memory exists
  • systemd-oomd is active

What does PSI (Pressure) tell you?

PSI measures “the time the CPU spent waiting due to insufficient memory”. This generates a signal minutes before OOM.

Example reading:

cat /proc/pressure/memory

Field interpretation:

  • If some is rising: some tasks are waiting → latency starts to rise.
  • If full is rising: a significant part of the system is blocked → the incident is now visible.

Design: first ask “which service goes?”

At OOM time, the answer to “which process should be killed?” is part of the architectural decision.

A practical classification:

  • Tier-0 (critical): control plane, identity, data layer (don’t die if at all possible)
  • Tier-1: API/application workers (die but come back)
  • Tier-2: batch, reports, cache warmer (the first to go)

Implementation: controlled eviction with systemd-oomd

While systemd manages services under slices, you can give oomd a policy along the lines of “if pressure is high in this group, kill”.

Example approach (not service-based, but slice-based management):

  1. Group application workloads under a separate slice (e.g. apps.slice)
  2. Place batch jobs into a separate, lower-priority slice (e.g. batch.slice)
  3. Enable the OOM policy on the slice

Example override for a slice:

sudo systemctl edit apps.slice
[Slice]
ManagedOOMMemoryPressure=kill
ManagedOOMMemoryPressureLimit=60%

Similarly, you can define a more aggressive limit for batch.

Runbook: step by step during an incident

1) Triage (5 minutes)

uptime
free -m
vmstat 1 5
cat /proc/pressure/memory
journalctl -u systemd-oomd --since "-30m" --no-pager
dmesg -T | tail -n 80

How to read it:

  • PSI rising + heavy reclaim → there’s “pressure”
  • OOM kill logs → it’s already too late, move to root-cause and containment

2) Containment (clear space in a controlled way)

Safe first moves:

  • Stop/scale down the batch jobs that consume the most memory
  • Cut “nice-to-have” processes like cache warmup/reports
  • Reduce app workers in a controlled way (watch the traffic + retry effect)

Quick visibility:

ps -eo pid,ppid,cmd,rss --sort=-rss | head -n 20
systemd-cgtop -m

3) Verification (10 minutes)

Is PSI dropping?

watch -n 2 'cat /proc/pressure/memory; echo; free -m'

If PSI isn’t dropping but memory is rising:

  • Possible memory leak
  • Retry storm (missing queue/backpressure)
  • Kernel slab / page cache pressure

4) Recovery standard

After things stabilize:

  • Roll back the temporary scale-downs
  • Add OOMD kill logs to the incident evidence set
  • Build a metric/trace/log correlation for “why did it happen?”

Testing (before going to production)

A simple pressure test on lab/stage:

sudo apt-get install -y stress-ng || true
stress-ng --vm 2 --vm-bytes 80% --timeout 60s

Expected:

  • PSI rises
  • OOMD applies a controlled kill within the target slice
  • Critical services (tier-0) are protected

Postmortem: a permanent improvement list

  • Limits: per-service memory limit/requests, cache size
  • Observation: PSI alarms, reclaim/pgfault indicators, oomd decision logs
  • Resilience: queue/backpressure, retry budget, circuit breaker
  • Operations: a written standard for the decision “which service goes first?”

Conclusion

systemd-oomd reduces the randomness of OOM and turns memory pressure into a controlled eviction. The value comes less from tool installation and more from the joint discipline of service priority, cgroup limits, and PSI-based early warning working together.

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts