Flaky Tests & Reliability
Flaky test
Also known as: flaky test, test flakiness, intermittent test failure
A flaky test is a test that produces different results — sometimes passing, sometimes failing — on the same code, without any change to that code.
Flakiness comes from non-determinism: timing and race conditions, reliance on shared or external state, network calls, animations, or test-ordering dependencies. Because the code under test did not change, a flaky failure is a false signal — it tells you nothing reliable about whether the build is good.
Flaky tests are costly because they erode trust. Once a suite cries wolf often enough, teams start ignoring red builds — and a real regression slips through. Google reported that around 16% of its tests exhibit some flakiness, and that flaky failures account for a large share of the failures engineers investigate.
- Same code, different result = flaky (not a real bug).
- Common causes: timing/races, shared state, network, test order, animations.
- Cost is trust: teams stop believing red builds and miss real regressions.
Frequently asked
How do you know a test is flaky and not just failing?
Re-run the test on the exact same commit. If it sometimes passes and sometimes fails with no code change, it is flaky. The reliable way to confirm this at scale is to track each test’s pass/fail history across many runs rather than judging from a single result.
Related terms
Sources
See it in your own test results
Qualflare detects flaky tests, clusters failures by root cause, and scores release risk from the test results you already produce in CI. Start free.
Start free with Qualflare← Back to the testing & observability glossary.
Last reviewed June 11, 2026