Testing & observability glossary
Plain-English definitions of the terms behind modern test automation, CI, and test observability — flaky tests, quality gates, test impact analysis, failure clustering, DORA metrics, and more. Each entry answers the question first, then goes deeper.
Flaky Tests & Reliability
-
Flaky test
A flaky test is a test that produces different results — sometimes passing, sometimes failing — on the same code, without any change to that code.
-
Flaky test detection
Flaky test detection is the practice of identifying tests that fail intermittently by analyzing their pass/fail history across many runs, rather than from a single result.
-
Non-determinism (in tests)
Non-determinism in tests is when the same test and the same code can yield different outcomes because the result depends on uncontrolled factors like timing, ordering, or shared state.
-
Test quarantine
Test quarantine is the practice of moving known-flaky tests out of the blocking path of a build so they stop failing the pipeline while their result is still recorded for triage.
-
Test retry
A test retry automatically re-runs a failed test a set number of times and passes it if any attempt succeeds — a way to absorb flakiness so it doesn’t fail the build.
Test Observability
-
Flake rate
Flake rate is the percentage of test runs (or failures) that are flaky rather than genuine — a headline metric for how much you can trust your test suite.
-
Mean time to detection (MTTD)
Mean time to detection (MTTD) is the average time between a defect being introduced and a test or signal catching it — a measure of how fast your quality feedback loop is.
-
Test observability
Test observability is the ability to understand why your tests pass or fail over time by collecting and analyzing test results across every run — not just whether a single run was green.
AI in Testing
-
Agentic testing
Agentic testing is the use of AI agents that plan and carry out testing tasks with limited human direction — generating tests, running them, diagnosing failures, and adapting — rather than only assisting a human one step at a time.
-
Failure clustering
Failure clustering is the automatic grouping of many test failures that share the same underlying cause, so a wall of red collapses into a handful of distinct problems to fix.
-
Predictive flaky scoring
Predictive flaky scoring uses a test’s historical behavior to assign it a flakiness probability, flagging unreliable tests before they block a release rather than after.
-
Self-healing tests
Self-healing tests automatically repair their own locators or steps when the application changes — for example when a selector moves or is renamed — so a cosmetic UI change does not break the test.
-
Smart test selection
Smart test selection runs only the tests most likely to be affected by a given code change, instead of the entire suite, to cut CI time while preserving the chance of catching regressions.
CI/CD & Velocity
-
CI feedback loop
The CI feedback loop is the time between pushing a code change and getting a usable test result back. The shorter it is, the sooner developers can act while the change is still fresh in their heads.
-
DORA metrics
DORA metrics are four research-backed measures of software delivery performance: deployment frequency, lead time for changes, change failure rate, and time to restore service.
-
Monorepo testing
Monorepo testing is running and aggregating tests for many projects or packages that live in one repository, where a single commit can touch code shared across several of them.
-
Quality gate
A quality gate is an automated pass/fail checkpoint in a CI/CD pipeline that blocks a build from progressing unless it meets defined criteria — like test pass rate, coverage, or flakiness thresholds.
-
Test impact analysis (TIA)
Test impact analysis (TIA) determines which tests are affected by a specific code change so CI can run just those tests instead of the whole suite.
-
Test parallelization
Test parallelization runs multiple tests at the same time — across threads, processes, or machines — instead of one after another, to shorten the total run.
-
Test sharding
Test sharding splits a test suite into multiple subsets (shards) that run on separate machines or CI jobs at the same time, cutting total wall-clock time.
QA Foundations
-
Release readiness
Release readiness is an assessment of whether a build is safe to ship, based on signals like test pass rate, flakiness, failure clusters, coverage of critical paths, and open risks.
-
Shift-left testing
Shift-left testing means moving testing earlier in the development process — closer to when code is written — so defects are caught sooner and cost less to fix.
-
Test debt
Test debt is the accumulated cost of neglected test suites — flaky tests, brittle selectors, poor coverage, and slow runs — that makes testing progressively harder and less trustworthy over time.
-
Test pyramid
The test pyramid is a strategy that favors many fast, cheap unit tests at the base, fewer integration tests in the middle, and a small number of slow end-to-end tests at the top.
Put these concepts to work
Qualflare turns your CI test results into AI failure clustering, flaky-test detection, and release-risk scoring. Start free — get your first analysis in minutes.
Start free with Qualflare