Core dumps are one of the most useful and at the same time most risky debug artifacts in production. You’re capturing a process’s memory, after all. That memory can be worth its weight in gold for root-cause work, but it can just as easily contain secrets, tokens, and PII.
The goal of this runbook is to strike the right balance:
- Enough data to debug rare crashes
- Disk and performance under control
- A clear privacy and access model
- A safe analysis flow during an incident
1) The decision: core dumps on or off?
The first question isn’t a technical one, it’s an operational one:
- Are crashes rare? Are they hard to reproduce?
- Does the service in question carry PII or secrets?
- Can you make the same diagnosis with logs, metrics, or traces?
2) Core control points
The main components that govern core dump behavior on Linux:
ulimit -c(core dump size limit)/proc/sys/kernel/core_pattern(where they get written)- The behavior of systemd’s
systemd-coredump(coredump.conf)
Quick status check
ulimit -c
sysctl kernel.core_pattern
systemctl status systemd-coredump
3) Configuring systemd-coredump (sample approach)
The path can vary by distribution; on most systems it’s:
/etc/systemd/coredump.conf/etc/systemd/coredump.conf.d/*.conf
A sample policy: limit size and retention so the disk doesn’t fill up:
[Coredump]
# core dosyalarını diske yaz
Storage=external
# boyut limitleri (örnek)
ProcessSizeMax=2G
ExternalSizeMax=2G
# toplam disk kotası (örnek)
MaxUse=10G
KeepFree=5G
# eski core'ları temizleme
Compress=yes
After making changes:
sudo systemctl daemon-reload
4) Privacy: a core dump is a “secret source”
Treat your core dumps in the same class as secrets:
- Access: only the incident/debug role (least privilege)
- Retention: short window (e.g. 7-14 days)
- Transfer: encrypted channel + encryption at rest
- Audit: who read it, when it was downloaded, against which ticket
Practical protections:
- Root-only permissions on the core dump directory
- A “coredump created” event into your central log pipeline
- If they leave the host: an encrypted artifact store (and ideally WORM/immutability)
5) Incident runbook: what do we do when a crash happens?
- Confirm the crash:
journalctl -u systemd-coredump --since "2 hours ago"
coredumpctl list --since "2 hours ago"
- Which binary, and which version?
- package version / image digest
- deploy time
- config/flag version (the version, not the values)
- Analysis environment:
- Don’t analyze a core dump on the production host (risk + performance)
- Use a separate “debug VM” or otherwise isolated environment
- Access and recordkeeping:
- Track “who took it?” (ticket)
- Artifact lifecycle: clean up once the analysis is done
6) Common mistakes
- Unbounded core dumps: the disk fills up and the incident grows
- Uncontrolled retention for processes that hold secrets
- Debug access “open to everyone” (bad practice)
- No cleanup after the core dump has been pulled
A well-designed core dump policy lets you diagnose crashes quickly while keeping the security and operational cost manageable. The goal in production is not “more data,” but the right data with a controlled process.