Cgroup v2 Memory Pressure Runbook with systemd-oomd

In production, the sentence “OOM happened, the process died” is a result; the real problem is usually the memory pressure that started before the OOM. With the wrong reflex (e.g. “open swap, run drop_caches”), you save the day, but the same incident comes back in a different form.

This article presents a field-focused runbook that catches memory pressure early and evicts in a controlled way using the Cgroup v2 + PSI (Pressure Stall Information) + systemd-oomd trio on Linux.

What are we trying to solve?

The goal is to reduce these three problems at the same time:

Randomness of the kernel OOM killer: the most critical process being chosen at the worst possible moment.
Cascading collapse: memory pressure → latency increase → retry storm → more memory.
Operational blindness: answering “why did it happen?” with intuition rather than evidence.

Prerequisites (checklist)

The kernel and distribution must be running with Cgroup v2.
The systemd version must include systemd-oomd (most modern distros do).
PSI metrics must be readable.

Quick verification:

# cgroup v2 mi?
stat -fc %T /sys/fs/cgroup

# PSI dosyaları var mı?
ls /proc/pressure/

# oomd çalışıyor mu?
systemctl status systemd-oomd

Expected:

cgroup2fs
/proc/pressure/memory exists
systemd-oomd is active

What does PSI (Pressure) tell you?

PSI measures “the time the CPU spent waiting due to insufficient memory”. This generates a signal minutes before OOM.

Example reading:

cat /proc/pressure/memory

Field interpretation:

If some is rising: some tasks are waiting → latency starts to rise.
If full is rising: a significant part of the system is blocked → the incident is now visible.

Design: first ask “which service goes?”

At OOM time, the answer to “which process should be killed?” is part of the architectural decision.

A practical classification:

Tier-0 (critical): control plane, identity, data layer (don’t die if at all possible)
Tier-1: API/application workers (die but come back)
Tier-2: batch, reports, cache warmer (the first to go)

Implementation: controlled eviction with systemd-oomd

While systemd manages services under slices, you can give oomd a policy along the lines of “if pressure is high in this group, kill”.

Example approach (not service-based, but slice-based management):

Group application workloads under a separate slice (e.g. apps.slice)
Place batch jobs into a separate, lower-priority slice (e.g. batch.slice)
Enable the OOM policy on the slice

Example override for a slice:

sudo systemctl edit apps.slice

[Slice]
ManagedOOMMemoryPressure=kill
ManagedOOMMemoryPressureLimit=60%

Similarly, you can define a more aggressive limit for batch.

Runbook: step by step during an incident

1) Triage (5 minutes)

uptime
free -m
vmstat 1 5
cat /proc/pressure/memory
journalctl -u systemd-oomd --since "-30m" --no-pager
dmesg -T | tail -n 80

How to read it:

PSI rising + heavy reclaim → there’s “pressure”
OOM kill logs → it’s already too late, move to root-cause and containment

2) Containment (clear space in a controlled way)

Safe first moves:

Stop/scale down the batch jobs that consume the most memory
Cut “nice-to-have” processes like cache warmup/reports
Reduce app workers in a controlled way (watch the traffic + retry effect)

Quick visibility:

ps -eo pid,ppid,cmd,rss --sort=-rss | head -n 20
systemd-cgtop -m

3) Verification (10 minutes)

Is PSI dropping?

watch -n 2 'cat /proc/pressure/memory; echo; free -m'

If PSI isn’t dropping but memory is rising:

Possible memory leak
Retry storm (missing queue/backpressure)
Kernel slab / page cache pressure

4) Recovery standard

After things stabilize:

Roll back the temporary scale-downs
Add OOMD kill logs to the incident evidence set
Build a metric/trace/log correlation for “why did it happen?”

Testing (before going to production)

A simple pressure test on lab/stage:

sudo apt-get install -y stress-ng || true
stress-ng --vm 2 --vm-bytes 80% --timeout 60s

Expected:

PSI rises
OOMD applies a controlled kill within the target slice
Critical services (tier-0) are protected

Postmortem: a permanent improvement list

Limits: per-service memory limit/requests, cache size
Observation: PSI alarms, reclaim/pgfault indicators, oomd decision logs
Resilience: queue/backpressure, retry budget, circuit breaker
Operations: a written standard for the decision “which service goes first?”

Conclusion

systemd-oomd reduces the randomness of OOM and turns memory pressure into a controlled eviction. The value comes less from tool installation and more from the joint discipline of service priority, cgroup limits, and PSI-based early warning working together.

Cgroup v2 Memory Pressure Runbook with systemd-oomd

What are we trying to solve?

Prerequisites (checklist)

What does PSI (Pressure) tell you?

Design: first ask “which service goes?”

Implementation: controlled eviction with systemd-oomd

Runbook: step by step during an incident

1) Triage (5 minutes)

2) Containment (clear space in a controlled way)

3) Verification (10 minutes)

4) Recovery standard

Testing (before going to production)

Postmortem: a permanent improvement list

Conclusion

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Centralized Logging with systemd-journal-remote: mTLS and Retention

Linux SoftIRQ Saturation and IRQ Affinity Runbook

Zero-Downtime Restart with systemd Socket Activation

What are we trying to solve?

Prerequisites (checklist)

What does PSI (Pressure) tell you?

Design: first ask “which service goes?”

Implementation: controlled eviction with systemd-oomd

Runbook: step by step during an incident

1) Triage (5 minutes)

2) Containment (clear space in a controlled way)

3) Verification (10 minutes)

4) Recovery standard

Testing (before going to production)

Postmortem: a permanent improvement list

Conclusion

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Centralized Logging with systemd-journal-remote: mTLS and Retention

Linux SoftIRQ Saturation and IRQ Affinity Runbook

Zero-Downtime Restart with systemd Socket Activation

Klavye Kısayolları