DORA Metrics for QA Teams (2026)

DORA's four metrics measure delivery speed and stability. QA most affects stability — reliable, fast tests lower change failure rate and shorten lead time.

İbrahim Süren

Founder · Jun 25, 2026 · 6 min read

DORA's four metrics — deployment frequency, lead time for changes, change failure rate, and time to restore service — measure how fast and how safely a team ships. QA most directly moves the stability side: reliable, fast tests lower change failure rate and shorten lead time, while flaky, slow tests drag both the wrong way.

Key takeaways

The four DORA metrics split into throughput (deployment frequency, lead time) and stability (change failure rate, time to restore).
QA's biggest lever is the stability metrics — testing is how you keep change failure rate low.
Fast, trustworthy tests also shorten lead time by keeping the CI feedback loop short.
Flaky and slow tests hurt DORA: they delay releases and erode the signal that catches regressions.
Track QA-side leading indicators — flake rate, mean time to detection, feedback-loop time — to move DORA.

DORA metrics are how engineering leaders talk about delivery performance — but they’re usually framed as a DevOps or platform concern, and QA gets left out of the conversation. That’s a mistake. Testing is one of the strongest levers on the metrics that matter most, and QA teams that can speak in DORA terms make their impact legible to leadership. This guide maps testing to each metric. Where we reference Qualflare, we describe only what it actually does.

The four DORA metrics

DORA metrics, from the DevOps Research and Assessment program, are four measures that split into two pairs:

Throughput — deployment frequency (how often you ship) and lead time for changes (commit to production).
Stability — change failure rate (share of deployments that cause a failure) and time to restore service (how fast you recover).

The DORA Four Keys are the standard vocabulary for whether a team ships both fast and safely — not one at the expense of the other.

One note on currency: DORA has since expanded this model. dora.dev now frames five metrics and has renamed “time to restore service” to “failed deployment recovery time.” The classic four keys below still map cleanly onto QA’s work, so we keep the familiar terms throughout this guide.

Why QA teams should care

The headline insight: testing’s biggest impact is on the stability metrics, and stability is what keeps throughput honest. Anyone can deploy more often by testing less — until change failure rate spikes and time to restore balloons. Good testing is what lets a team raise throughput without sacrificing stability. Framing QA’s work in DORA terms turns “we reduced flaky tests” into “we lowered change failure rate”, which is the language leadership funds.

How testing maps to each metric

DORA metric	How QA moves it
Change failure rate	The most direct link — reliable tests catch regressions before they ship
Lead time for changes	Fast, trustworthy tests keep the CI feedback loop short
Deployment frequency	Confidence from good tests lets teams ship smaller changes more often
Time to restore service	Fast detection and clear failure signals shorten diagnosis

The stability metrics QA owns

Change failure rate is the metric testing influences most. It’s a direct measure of whether your suite catches problems before production. Improving coverage of critical paths and — critically — reducing flakiness moves it more than almost anything else, because a suite people trust actually blocks bad deploys instead of being waved through.

Time to restore service is shortened by fast detection and legible failures. When a test goes red and a failure cluster points straight at the cause, diagnosis is minutes, not hours.

How flaky and slow tests hurt DORA

Flaky and slow tests drag every metric the wrong way. Slow suites lengthen lead time directly — speeding up the CI suite is itself lead-time work. Flaky tests are worse: they delay releases through reruns, and — by training teams to ignore red builds — they raise change failure rate, because the one red build that mattered gets dismissed as “probably flaky.” And flakiness is common enough to matter at scale: Google reported that ~16% of its tests have some level of flakiness, so a team trained to wave through red builds is dismissing a large, noisy population of failures — and change failure rate climbs the moment one of them was real. Reliability work is DORA work.

What to measure on the QA side

DORA metrics are lagging indicators; QA should track the leading ones that feed them:

Flake rate — how much the suite can be trusted.
Mean time to detection — how fast tests catch a defect.
CI feedback-loop time — how fast developers get a result.

Improve these and the downstream DORA numbers follow. The history needed to measure all three is the same test observability data that powers flaky detection and clustering. In a monorepo, aggregating results across packages is what gives you a single, trustworthy pass rate to track those indicators against.

Start free with Qualflare — connect your pipeline and track the flakiness, detection, and feedback-loop signals that move your DORA metrics.

Frequently asked questions

What are the four DORA metrics?

Deployment frequency (how often you ship), lead time for changes (commit to production), change failure rate (the share of deployments that cause a failure), and time to restore service (how quickly you recover). The first two measure throughput; the last two measure stability.

How does QA affect DORA metrics?

Most directly through the stability metrics. Reliable tests catch regressions before they ship, lowering change failure rate, and fast, trustworthy tests keep the feedback loop short, which shortens lead time. Flaky and slow tests do the opposite — they delay releases and let real failures hide in the noise.

Which DORA metric does testing influence most?

Change failure rate. It’s the most direct measure of whether your testing catches problems before production. Reducing flakiness and improving coverage of critical paths moves it more than almost anything else QA does.

What should QA teams measure to improve DORA?

Track leading indicators that feed DORA: flake rate (trust in the suite), mean time to detection (how fast tests catch a defect), and CI feedback-loop time (how fast developers get a result). Improving these shows up downstream in change failure rate and lead time.