Most teams do load testing at one of two extremes: either they skip it entirely, or it devolves into a “how many RPS did we hit?” contest. The operational truth, however, is this: what you need to win in production is not maximum throughput, it is staying within your SLO. That is why I anchor load testing to the following question:
Under what load can this service stay without breaking its target SLO (latency + error)?
1) Define the SLO first
Before you launch a load test, write down these three targets:
- p95 latency (e.g. 250ms)
- error rate (e.g. 0.5%)
- a “saturation signal” you will track during the run (CPU, thread pool, DB conn, queue lag, etc.)
2) k6 scenario: simple but decision-producing
A sample k6 skeleton (assuming an HTTP API):
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '2m', target: 20 },
{ duration: '3m', target: 50 },
{ duration: '3m', target: 80 },
{ duration: '2m', target: 0 },
],
thresholds: {
http_req_failed: ['rate<0.005'], // error rate < 0.5%
http_req_duration: ['p(95)<250'], // p95 < 250ms
},
};
export default function () {
const res = http.get(`${__ENV.BASE_URL}/api/v1/health`);
check(res, { 'status 200': (r) => r.status === 200 });
sleep(0.2);
}
The scenario’s purpose is not “the highest possible request count”; it is to observe the boundary at which you remain inside the SLO.
3) Capacity baseline: compare “today” with “tomorrow”
For real operational benefit, the test must not be a one-shot. Two practical options:
- Persist the baseline as JSON (e.g. the last successful release)
- Compare new test results against the baseline to catch regressions
A simple approach:
- Export k6 output as JSON via
--summary-export - In CI, compare p95 and error metrics against the baseline
- Fail the pipeline above a meaningful drift threshold
4) Release gate logic: make performance an “acceptance gate”
The release gate rule I recommend:
- Even if the functional tests pass,
- If p95 or the error rate breaches the SLO, or noticeably regresses against baseline,
- The release drops into “guarded” mode (smaller canary, more aggressive rollback, shorter observation window).
This model promotes “performance testing” from a report into actual change control.
5) The most common field mistake: the test environment does not reflect reality
To reduce this trap:
- Use production-like data (at minimum, similar data volume and index behavior)
- Do cache warmup separately; measure only after warmup completes
- Simulate the impact of test users on rate limiting / auth
- Test downstream dependencies against controlled real instances rather than fakes
Conclusion: load testing must become the language of capacity management
SLO-driven load testing with k6 turns performance from “weather forecast” into operational decision input. When baseline and gate work together, you catch “release-slowing” changes before they ship. That shifts you from incidents toward controlled improvement.