In enterprise environments SSO usually starts as “a single login screen”; a few months later the real need surfaces: legacy applications speaking SAML, modern services expecting OIDC, and at the same time different IdPs, MFA policies, role/group mappings, and audit obligations.
In this post I describe an approach that treats SSO not as application-by-application “integration work,” but as a critical platform component: a SAML/OIDC gateway (SSO broker / federation gateway).
The real SSO problem: not protocols, but “carrying the policy”
The hard part of SSO is usually not producing a SAML assertion or an OIDC token. The hard part is:
- Carrying the “who, when, under what conditions” policy in one place
- Keeping role/group/claim mappings sustainable as the application count grows
- Managing decisions like MFA, conditional access, and device compliance centrally rather than per app
- Tracing during audit and incident from a single “source of truth”
That’s why an SSO broker becomes not “a comfort” but an operational necessity.
Architecture: what does the federation gateway do?
The gateway has two sides:
- Upstream (identity source): Entra ID, Okta, Keycloak, AD FS, etc. (one or more depending on the org)
- Downstream (applications): SAML SPs, OIDC Relying Parties, services behind a reverse proxy
The gateway takes on these functions:
- Protocol translation: SAML ⇄ OIDC (or both at once)
- Policy enforcement: MFA, IP/device conditions, risk-based rule sets
- Claim normalization: standardizing fields like
groups,roles,department,tenant,env - Session & token lifecycle: lifetime, refresh, revocation, logout
- Audit surface: who logged in, with what decisions they accessed, which app they reached
Recommended setup: the “two-door” model
A practical model at enterprise scale:
1) Public / External SSO edge
For internet-facing user flows (when there’s no VPN/VDI) and SaaS integrations.
- Behind WAF/rate-limit/CDN
- Bot and brute-force protection
- DDoS resilience
2) Internal / Admin SSO control plane
A separate risk profile for critical management interfaces and admin applications.
- Shorter token lifetime
- Mandatory MFA + device compliance
- Break-glass access on a separate policy
This separation seriously reduces blast radius during incidents.
Claim/role design: manage a “contract,” not a “group name”
This is where things break the most: applications get hard-wired to groups, group names change, departments merge, and one morning prod access drifts.
My preferred approach:
- Bind application authority to an abstract contract like
role(app:billing:admin,app:erp:read-only) - Bind groups/teams to those roles via mapping
- Version the roles and record “who owns it”
This way an org change doesn’t break applications; only the mapping is updated.
Token/session lifetime: think security and operations together
Lifetime decisions in SSO are not just security decisions; they are also operational decisions.
A practical example set:
- Normal user: access token 10–15 min, refresh token 8–12 hours (depending on device compliance)
- Admin/privileged: access token 5 min, no refresh or very short, step-up MFA
- Service accounts: not user tokens — workload identity / mTLS / OIDC federation
Operational reality: how do you run the SSO broker?
1) Load and capacity
The most common mistake: assuming SSO is “low traffic.” Traffic spikes instantly:
- Start of business hours (peak)
- Re-login wave during a global incident
- Application cache invalidation (session storm)
Therefore:
- Stateless design + horizontal scaling
- HA + latency budget for the session store (if you use one)
- JWK / metadata cache (to reduce IdP dependency)
2) Monitoring (minimum)
auth_success_rate,auth_failure_rate- MFA step-up rate
token_issue_latency(p95/p99)- Upstream IdP reachability
- SAML signature / cert errors (trend)
3) Runbook: the “people can’t log in” incident
For fast triage, classify the events:
- Is it an upstream IdP outage?
- Is it time drift / a certificate expiration?
- Is it DNS / network path?
- Is it a block after a policy change?
- Is it an app’s wrong metadata/redirect URI deploy?
The goal in the first 10 minutes shouldn’t be “root cause”; it should be safely restoring access:
- Verification with a canary app
- Reverting changes (policy rollback)
- Controlled enablement of the break-glass flow (audit + time-bounded)
Migration strategy: by “wave,” not by application
SSO migration is a change in operating model, not one-by-one integration.
- Wave 1: low-risk internal apps (read-only)
- Wave 2: business-critical but low-privilege flows
- Wave 3: admin and high-risk applications
Every wave needs a “rollback” plan and a smoke test set:
- Login + logout
- Group/role mapping
- MFA step-up
- Audit log verification
Final word
What makes enterprise SSO successful isn’t “which product”; it’s treating SSO as a platform component and being able to carry identity policy independently of protocols. When you stand up a SAML/OIDC gateway, integration cost drops, audit quality rises, and most importantly, control stays with you during incidents.