What truly keeps a Kubernetes production cluster healthy is not the worker count alone — it is whether the control plane (API Server) still has room to breathe. Most teams only notice an API Server problem through these symptoms:
kubectlcalls hang or crawl- controllers fall behind on the desired state (rollouts stall)
- the admission/validation chain dominoes and locks deployments
In this article I’ll walk through how I treat the control plane like a shared resource using API Priority and Fairness (APF), with the approach that has actually held up in the field: guarantee headroom for critical calls, queue the noisy ones fairly, and wire the signals so overload never has to become an incident.
What APF actually solves (and what it does not)
APF classifies inbound API Server requests into buckets and then does two things:
- Priority: Some buckets (for example
kube-system, node/lease traffic) are processed at a higher priority. - Fairness: A noisy class that produces a high request volume (think CI’s list/watch flood) cannot drain capacity on its own; it is forced through queueing and share allocation.
Things APF will not fix for you:
- If etcd is slow, APF cannot work miracles (start with etcd health and the IO/memory budget)
- If your admission webhooks are unhealthy, APF only decides “who hurts first” (the webhook problem must still be addressed separately)
1) What counts as “critical”? (Field-tested classification)
Critical buckets typically cover:
- Node/lease / heartbeat traffic (carries the “I’m alive” signal from nodes)
kube-systemcontrollers (scheduler/controller-manager, DNS, CNI)- Cluster autoscaler or node lifecycle (scaling and recovery)
- “Constrained but important” security/operations automation (for example an emergency RBAC change)
Noisy buckets:
- Bursts of
list/watchcalls from CI/CD systems firing in parallel - Discovery sweeps using overly broad label selectors
- Poorly written “polling” clients
The goal here is not “shut down CI.” The goal: even when CI traffic spikes, the core of the control plane stays alive.
2) Don’t enable APF blind: a minimum signal set
Before turning APF on, make these signals visible:
- API Server
apiserver_request_totaland latency histograms (p95/p99) 429(Too Many Requests) ratioapiserver_flowcontrol_*metrics (queue, rejected, dispatched)- Etcd request latency plus fdb/IO indicators (don’t miss the real bottleneck behind the API)
The exact metric names in your Prometheus/Grafana stack may differ — the goal is the same: queues, rejections, and latency.
3) Starting strategy: “step-by-step guardrails”
The most common APF mistake is starting with a highly granular classification. The approach that has produced the steadiest results for me on the ground:
- Keep the default behavior, only carve out the noise into its own class
- Give that noise class low priority + a capped share
- Then, gradually peel off the critical classes
This pattern lowers both your risk and your surprise count on day one.
4) Sample configuration: FlowSchema + PriorityLevel (minimal yet effective)
APF’s logic runs on two CRDs:
PriorityLevelConfiguration: queue/share behavior for that classFlowSchema: which requests fall into that class (user/group/namespace/verb, etc.)
Treat the examples below as templates rather than “apply as-is to every cluster.”
A) Push noisy clients into a low-priority bucket (CI and similar)
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: PriorityLevelConfiguration
metadata:
name: low-ci
spec:
type: Limited
limited:
nominalConcurrencyShares: 10
lendablePercent: 0
limitResponse:
type: Queue
queuing:
queues: 64
handSize: 8
queueLengthLimit: 200
---
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
name: ci-list-watch
spec:
priorityLevelConfiguration:
name: low-ci
matchingPrecedence: 9000
distinguisherMethod:
type: ByUser
rules:
- subjects:
- kind: Group
group:
name: ci-readers
resourceRules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["get", "list", "watch"]
The practical message of this pattern: CI’s high-volume list/watch traffic must not lock up the API Server.
B) Guaranteed share for kube-system (critical)
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: PriorityLevelConfiguration
metadata:
name: system-critical
spec:
type: Limited
limited:
nominalConcurrencyShares: 80
lendablePercent: 50
limitResponse:
type: Reject
---
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
name: kube-system-critical
spec:
priorityLevelConfiguration:
name: system-critical
matchingPrecedence: 1000
distinguisherMethod:
type: ByNamespace
rules:
- subjects:
- kind: Group
group:
name: system:serviceaccounts
resourceRules:
- apiGroups: ["*"]
resources: ["*"]
namespaces: ["kube-system"]
verbs: ["*"]
The intent here: critical flows inside kube-system must always find a slot, even when “everything is on fire.”
5) Rollout plan: “see first, then squeeze”
When enabling APF in production, here is the order I recommend:
- Make metrics visible (APF dashboard + 429/latency alerts)
- Isolate the noise (low priority + queue)
- Carve out critical classes (system-critical)
- Then layer finer classes on top (for example break-glass admin, platform automation)
If you flip the order and try to classify everything on day one, post-incident triage of “which change caused which effect?” becomes painful.
6) Incident runbook: when API Server overload shows up
With APF in place, incident triage becomes far more deterministic:
- API Server latency climbing + 429s — which flow/priority level is hit?
- Use
apiserver_flowcontrol_dispatched_requests_totalto find “who is eating the slots?” - Apply temporary throttling to the noisy flow:
- lower the queue length limit
- shrink the nominal share
- if needed, a short-lived
Reject(last resort)
- Root cause: admission/webhook, etcd, network, client behavior?
Wrapping up
APF gives Kubernetes the ability to “live with the noise”: it shields the critical traffic while fairly queueing the rest. What separates teams that succeed in the field is not YAML — it is the discipline of identity-based classification, measurement, gradual rollout, and a usable incident runbook. Teams that protect their control plane this way ship fewer “surprise deploy lockups” and run far more predictable operations.