AI in Testing

Predictive flaky scoring

Predictive flaky scoring uses a test’s historical behavior to assign it a flakiness probability, flagging unreliable tests before they block a release rather than after.

Read the full guide: How predictive flaky scoring works

Rather than waiting for a test to fail intermittently and disrupt a deploy, a model learns from each test’s pass/fail history, retry patterns, and timing to estimate how likely it is to be flaky. High-scoring tests can be surfaced, watched, or quarantined proactively.

The payoff is fewer surprise red builds at the worst possible moment. It moves flakiness handling from reactive firefighting to a managed, data-driven signal.

Assigns each test a flakiness probability from its history, not a binary label.
Surfaces likely-flaky tests before they block a deploy, rather than after.
Turns flakiness from reactive firefighting into a managed, data-driven signal.

Frequently asked

How is predictive flaky scoring different from flaky test detection?

Detection identifies tests that have already flaked from their pass/fail history. Predictive scoring goes further — it estimates how likely a test is to flake next, from retry patterns, timing, and history, so unreliable tests can be watched or quarantined proactively. Meta pioneered this idea with a probabilistic flakiness score.