Protecting the Kubernetes Control Plane with API Priority and Fairness

What truly keeps a Kubernetes production cluster healthy is not the worker count alone — it is whether the control plane (API Server) still has room to breathe. Most teams only notice an API Server problem through these symptoms:

kubectl calls hang or crawl
controllers fall behind on the desired state (rollouts stall)
the admission/validation chain dominoes and locks deployments

In this article I’ll walk through how I treat the control plane like a shared resource using API Priority and Fairness (APF), with the approach that has actually held up in the field: guarantee headroom for critical calls, queue the noisy ones fairly, and wire the signals so overload never has to become an incident.

What APF actually solves (and what it does not)

APF classifies inbound API Server requests into buckets and then does two things:

Priority: Some buckets (for example kube-system, node/lease traffic) are processed at a higher priority.
Fairness: A noisy class that produces a high request volume (think CI’s list/watch flood) cannot drain capacity on its own; it is forced through queueing and share allocation.

Things APF will not fix for you:

If etcd is slow, APF cannot work miracles (start with etcd health and the IO/memory budget)
If your admission webhooks are unhealthy, APF only decides “who hurts first” (the webhook problem must still be addressed separately)

1) What counts as “critical”? (Field-tested classification)

Critical buckets typically cover:

Node/lease / heartbeat traffic (carries the “I’m alive” signal from nodes)
kube-system controllers (scheduler/controller-manager, DNS, CNI)
Cluster autoscaler or node lifecycle (scaling and recovery)
“Constrained but important” security/operations automation (for example an emergency RBAC change)

Noisy buckets:

Bursts of list/watch calls from CI/CD systems firing in parallel
Discovery sweeps using overly broad label selectors
Poorly written “polling” clients

The goal here is not “shut down CI.” The goal: even when CI traffic spikes, the core of the control plane stays alive.

Before turning APF on, make these signals visible:

API Server apiserver_request_total and latency histograms (p95/p99)
429 (Too Many Requests) ratio
apiserver_flowcontrol_* metrics (queue, rejected, dispatched)
Etcd request latency plus fdb/IO indicators (don’t miss the real bottleneck behind the API)

The exact metric names in your Prometheus/Grafana stack may differ — the goal is the same: queues, rejections, and latency.

3) Starting strategy: “step-by-step guardrails”

The most common APF mistake is starting with a highly granular classification. The approach that has produced the steadiest results for me on the ground:

Keep the default behavior, only carve out the noise into its own class
Give that noise class low priority + a capped share
Then, gradually peel off the critical classes

This pattern lowers both your risk and your surprise count on day one.

4) Sample configuration: FlowSchema + PriorityLevel (minimal yet effective)

APF’s logic runs on two CRDs:

PriorityLevelConfiguration: queue/share behavior for that class
FlowSchema: which requests fall into that class (user/group/namespace/verb, etc.)

Treat the examples below as templates rather than “apply as-is to every cluster.”

A) Push noisy clients into a low-priority bucket (CI and similar)

apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: PriorityLevelConfiguration
metadata:
  name: low-ci
spec:
  type: Limited
  limited:
    nominalConcurrencyShares: 10
    lendablePercent: 0
    limitResponse:
      type: Queue
      queuing:
        queues: 64
        handSize: 8
        queueLengthLimit: 200
---
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
  name: ci-list-watch
spec:
  priorityLevelConfiguration:
    name: low-ci
  matchingPrecedence: 9000
  distinguisherMethod:
    type: ByUser
  rules:
    - subjects:
        - kind: Group
          group:
            name: ci-readers
      resourceRules:
        - apiGroups: ["*"]
          resources: ["*"]
          verbs: ["get", "list", "watch"]

The practical message of this pattern: CI’s high-volume list/watch traffic must not lock up the API Server.

B) Guaranteed share for `kube-system` (critical)

apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: PriorityLevelConfiguration
metadata:
  name: system-critical
spec:
  type: Limited
  limited:
    nominalConcurrencyShares: 80
    lendablePercent: 50
    limitResponse:
      type: Reject
---
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
  name: kube-system-critical
spec:
  priorityLevelConfiguration:
    name: system-critical
  matchingPrecedence: 1000
  distinguisherMethod:
    type: ByNamespace
  rules:
    - subjects:
        - kind: Group
          group:
            name: system:serviceaccounts
      resourceRules:
        - apiGroups: ["*"]
          resources: ["*"]
          namespaces: ["kube-system"]
          verbs: ["*"]

The intent here: critical flows inside kube-system must always find a slot, even when “everything is on fire.”

5) Rollout plan: “see first, then squeeze”

When enabling APF in production, here is the order I recommend:

Make metrics visible (APF dashboard + 429/latency alerts)
Isolate the noise (low priority + queue)
Carve out critical classes (system-critical)
Then layer finer classes on top (for example break-glass admin, platform automation)

If you flip the order and try to classify everything on day one, post-incident triage of “which change caused which effect?” becomes painful.

6) Incident runbook: when API Server overload shows up

With APF in place, incident triage becomes far more deterministic:

API Server latency climbing + 429s — which flow/priority level is hit?
Use apiserver_flowcontrol_dispatched_requests_total to find “who is eating the slots?”
Apply temporary throttling to the noisy flow:
- lower the queue length limit
- shrink the nominal share
- if needed, a short-lived Reject (last resort)
Root cause: admission/webhook, etcd, network, client behavior?

Wrapping up

APF gives Kubernetes the ability to “live with the noise”: it shields the critical traffic while fairly queueing the rest. What separates teams that succeed in the field is not YAML — it is the discipline of identity-based classification, measurement, gradual rollout, and a usable incident runbook. Teams that protect their control plane this way ship fewer “surprise deploy lockups” and run far more predictable operations.

Protecting the Kubernetes Control Plane with API Priority and Fairness

What APF actually solves (and what it does not)

1) What counts as “critical”? (Field-tested classification)

2) Don’t enable APF blind: a minimum signal set

3) Starting strategy: “step-by-step guardrails”

4) Sample configuration: FlowSchema + PriorityLevel (minimal yet effective)

A) Push noisy clients into a low-priority bucket (CI and similar)

5) Rollout plan: “see first, then squeeze”

6) Incident runbook: when API Server overload shows up

Wrapping up

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Kubernetes Control Plane Certificate Expiry: A Runbook

Kubernetes Admission Webhook Timeouts: A Runbook for Frozen Deploys

Kubernetes API Server Audit Log: Policy and SIEM Pipeline

What APF actually solves (and what it does not)

1) What counts as “critical”? (Field-tested classification)

2) Don’t enable APF blind: a minimum signal set

3) Starting strategy: “step-by-step guardrails”

4) Sample configuration: FlowSchema + PriorityLevel (minimal yet effective)

A) Push noisy clients into a low-priority bucket (CI and similar)

B) Guaranteed share for kube-system (critical)

5) Rollout plan: “see first, then squeeze”

6) Incident runbook: when API Server overload shows up

Wrapping up

Comments

Curated digest, hand-picked by me — not the AI

Your Reading Stats

Related Posts

Kubernetes Control Plane Certificate Expiry: A Runbook

Kubernetes Admission Webhook Timeouts: A Runbook for Frozen Deploys

Kubernetes API Server Audit Log: Policy and SIEM Pipeline

Klavye Kısayolları

B) Guaranteed share for `kube-system` (critical)