İçeriğe Atla
Mustafa Erbay
Tutorials · 11 min read · görüntülenme Türkçe oku
100%

Centralized Logging with systemd-journal-remote: mTLS and Retention

A practical setup and runbook for shipping journald logs over mTLS to a central collector — without adding agents — while running a disciplined disk budget…

Centralized Logging with systemd-journal-remote: mTLS and Retention — cover image

In enterprise infrastructure, logging always gets squeezed between two extremes:

  • “Let’s drop an agent on every host and ship everything out” → cost and complexity grow
  • “Just keep it local” → during an incident the evidence vanishes and correlation becomes painful

The systemd ecosystem is already there on most Linux distros. In this post I’ll walk through the practical path I use to ship journald logs centrally with systemd-journal-upload + systemd-journal-remote, secured by mTLS, and run with a disciplined retention/disk budget.

1) Architectural call: where should journal-remote live?

The most stable topology I’ve seen in the field:

  • Every host: journald (already there) + systemd-journal-upload
  • Center: 2 log gateways (HA) + disk budget + backpressure

That gateway tier is responsible for:

  • Terminating mTLS
  • Enforcing an allow list (who is even allowed to push logs?)
  • Keeping the “raw evidence” in local storage (for incident use)
  • Optionally forwarding downstream (Loki/ELK/SIEM)

2) Security: mTLS and identity model

The most common mistake is leaving the log endpoint open because “we’re on the internal network.” Log ingest is an attack surface too.

Minimum model I aim for:

  • TLS mandatory at the gateway
  • Client cert for identity (mTLS)
  • Host identity derived from cert CN/SAN (e.g. host=web-12.prod)
  • Rate limit / connection limit (so you survive a log storm without folding)

3) Setup (high-level steps)

Commands vary by distribution. The point here is the runbook flow, not the exact syntax.

A) Gateway: systemd-journal-remote

  • HTTPS listener
  • Storage directory
  • Certificate/key

Sanity check:

systemctl status systemd-journal-remote
ss -lntp | rg -n "19532|journal" || true

B) Client: systemd-journal-upload

  • Gateway URL
  • Client certificate
  • Retry/backoff

Sanity check:

systemctl status systemd-journal-upload
journalctl -u systemd-journal-upload -n 50 --no-pager

4) Retention: there is no “infinite disk”, only a policy

Retention is the most consequential decision in a centralized log tier:

  • How many days of raw logs do we keep? (e.g. 7/14/30)
  • What happens when the disk fills up? (drop, rotate, or apply backpressure?)
  • Is there a compliance scope? (do we need a separate WORM / S3 Object Lock tier?)

A pragmatic approach:

  • Short retention at the gateway (just enough for incident evidence)
  • If long retention is required, hand it off to downstream archiving (object storage)

5) Operations: what signals do I actually watch?

  • Gateway disk usage + inode
  • Upload queue/backpressure (where applicable)
  • TLS handshake error rate (catches certificate rotation problems)
  • Client failed-upload count (the real “evidence loss” risk)

These signals are what “logging is working” actually translates to in practice.

6) Incident runbook: when “logs aren’t coming through”

  1. Client side:
    • Is systemd-journal-upload running?
    • Any TLS errors? (cert expiry, chain issues)
    • Is DNS/route in place? (gateway reachability)
  2. Gateway side:
    • Service up?
    • Disk full?
    • Are we hitting connection limits?
  3. Mitigation:
    • If under disk pressure, temporarily tighten retention
    • If a cert is the problem, fall back to a known-good cert immediately
  4. Permanent fix:
    • Automate certificate rotation
    • Disk budget + alarm
    • Downstream archive (when compliance needs it)

Wrap-up

Centralized logging via systemd-journal-remote is a low-friction way to harden your evidence chain without spinning up “yet another agent” project. The real value in the field isn’t in standing the service up — it’s in operating the mTLS identity model, the retention/disk budget, and the incident runbook together as one discipline. Logs aren’t just for debugging; they’re proof of operational reality.

Paylaş:

Bu yazı faydalı oldu mu?

Yükleniyor...

Bu yazı nasıldı?

ME

Mustafa Erbay

Sistem Mimarisi · Network Uzmanı · Altyapı, Güvenlik ve Yazılım

2006'dan bu yana sistem mimarisi, network, sunucu altyapıları, büyük yapıların kurulumu, yazılım ve sistem güvenliği ekseninde çalışıyorum. Bu blogda sahada karşılığı olan teknik deneyimlerimi paylaşıyorum.

Kişisel Notlar

Bu notlar sadece sizde saklanır. Tarayıcınızda yerel olarak tutulur.

Hazır 0 karakter

Comments

Server-side AI Moderation

Comments are AI-moderated server-side and stored permanently.

?
0/2000

Server-side AI moderation

✉️ Free · No spam · Unsubscribe anytime

Curated digest, hand-picked by me — not the AI

Once a week: the most important post of the week, behind-the-scenes notes, and a "what I actually used this week" section. Less noise, more signal.

  • 📌
    Best of the week Single most-worth-reading post
  • 🔧
    Toolbox notes Real tools I used this week
  • 🧠
    Behind-the-scenes Notes that don't make it to blog

We don't spam. Unsubscribe anytime. · Tracked only by Umami (self-hosted, no Google).

Your Reading Stats

0

Posts Read

0m

Reading Time

0

Day Streak

-

Favorite Category

Related Posts