Test Observability

Flake rate

Also known as: test flake rate, flakiness rate

Flake rate is the percentage of test runs (or failures) that are flaky rather than genuine — a headline metric for how much you can trust your test suite.

Flake rate can be measured per test, per suite, or across the whole pipeline. A common formulation is the share of failures that turn out to be flaky rather than real. The higher it climbs, the more time engineers waste investigating false alarms and the less anyone trusts a red build.

Tracking flake rate over time turns flakiness from anecdote into a managed metric: you can set a threshold, gate on it, and watch whether reliability work is actually paying off.

Measurable per test, per suite, or pipeline-wide.
Often expressed as the share of failures that are flaky, not real.
A trackable target — set a threshold and trend it over releases.

Frequently asked

What is a good flake rate?

Lower is always better. As a reference point, Google has reported that ~16% of its tests show some flakiness and ~1.5% of all test runs report a flaky result. Most teams set a per-suite threshold and drive it down over time rather than chasing an absolute number.

How do you measure flake rate?

Track each test’s pass/fail outcomes across many runs on unchanged code, then compute the share that vary — per test, per suite, or as a share of all failures. A single run cannot tell you a test is flaky; flake rate is inherently a historical metric.