Centralized Logging with systemd-journal-remote: mTLS and Retention

In enterprise infrastructure, logging always gets squeezed between two extremes:

“Let’s drop an agent on every host and ship everything out” → cost and complexity grow
“Just keep it local” → during an incident the evidence vanishes and correlation becomes painful

The systemd ecosystem is already there on most Linux distros. In this post I’ll walk through the practical path I use to ship journald logs centrally with systemd-journal-upload + systemd-journal-remote, secured by mTLS, and run with a disciplined retention/disk budget.

1) Architectural call: where should journal-remote live?

The most stable topology I’ve seen in the field:

Every host: journald (already there) + systemd-journal-upload
Center: 2 log gateways (HA) + disk budget + backpressure

That gateway tier is responsible for:

Terminating mTLS
Enforcing an allow list (who is even allowed to push logs?)
Keeping the “raw evidence” in local storage (for incident use)
Optionally forwarding downstream (Loki/ELK/SIEM)

2) Security: mTLS and identity model

The most common mistake is leaving the log endpoint open because “we’re on the internal network.” Log ingest is an attack surface too.

Minimum model I aim for:

TLS mandatory at the gateway
Client cert for identity (mTLS)
Host identity derived from cert CN/SAN (e.g. host=web-12.prod)
Rate limit / connection limit (so you survive a log storm without folding)

3) Setup (high-level steps)

Commands vary by distribution. The point here is the runbook flow, not the exact syntax.

A) Gateway: systemd-journal-remote

HTTPS listener
Storage directory
Certificate/key

Sanity check:

systemctl status systemd-journal-remote
ss -lntp | rg -n "19532|journal" || true

B) Client: systemd-journal-upload

Gateway URL
Client certificate
Retry/backoff

Sanity check:

systemctl status systemd-journal-upload
journalctl -u systemd-journal-upload -n 50 --no-pager

4) Retention: there is no “infinite disk”, only a policy

Retention is the most consequential decision in a centralized log tier:

How many days of raw logs do we keep? (e.g. 7/14/30)
What happens when the disk fills up? (drop, rotate, or apply backpressure?)
Is there a compliance scope? (do we need a separate WORM / S3 Object Lock tier?)

A pragmatic approach:

Short retention at the gateway (just enough for incident evidence)
If long retention is required, hand it off to downstream archiving (object storage)

5) Operations: what signals do I actually watch?

Gateway disk usage + inode
Upload queue/backpressure (where applicable)
TLS handshake error rate (catches certificate rotation problems)
Client failed-upload count (the real “evidence loss” risk)

These signals are what “logging is working” actually translates to in practice.

6) Incident runbook: when “logs aren’t coming through”

Client side:
- Is systemd-journal-upload running?
- Any TLS errors? (cert expiry, chain issues)
- Is DNS/route in place? (gateway reachability)
Gateway side:
- Service up?
- Disk full?
- Are we hitting connection limits?
Mitigation:
- If under disk pressure, temporarily tighten retention
- If a cert is the problem, fall back to a known-good cert immediately
Permanent fix:
- Automate certificate rotation
- Disk budget + alarm
- Downstream archive (when compliance needs it)

Wrap-up

Centralized logging via systemd-journal-remote is a low-friction way to harden your evidence chain without spinning up “yet another agent” project. The real value in the field isn’t in standing the service up — it’s in operating the mTLS identity model, the retention/disk budget, and the incident runbook together as one discipline. Logs aren’t just for debugging; they’re proof of operational reality.

Centralized Logging with systemd-journal-remote: mTLS and Retention

1) Architectural call: where should journal-remote live?

2) Security: mTLS and identity model

3) Setup (high-level steps)

A) Gateway: systemd-journal-remote

B) Client: systemd-journal-upload

4) Retention: there is no “infinite disk”, only a policy

5) Operations: what signals do I actually watch?

6) Incident runbook: when “logs aren’t coming through”

Wrap-up

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Core Dump Management and Privacy Runbook with systemd-coredump

Centralized Logging with Windows Event Forwarding (WEF)

An NTS and NTP Hardening Runbook with chrony

1) Architectural call: where should journal-remote live?

2) Security: mTLS and identity model

3) Setup (high-level steps)

A) Gateway: systemd-journal-remote

B) Client: systemd-journal-upload

4) Retention: there is no “infinite disk”, only a policy

5) Operations: what signals do I actually watch?

6) Incident runbook: when “logs aren’t coming through”

Wrap-up

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Core Dump Management and Privacy Runbook with systemd-coredump

Centralized Logging with Windows Event Forwarding (WEF)

An NTS and NTP Hardening Runbook with chrony

Klavye Kısayolları