pytest Flaky Tests: Fixing Flakiness with xdist, Fixtures & Retries (2026)

Q: How do I find which pytest tests are flaky?

Write JUnit XML (pytest --junitxml=results.xml) on every CI run and send it to a test observability tool that builds pass/fail history per test and scores flakiness. A single run can't reveal flakiness — it only becomes visible across many runs.

pytest tests flake from shared fixtures, parallel xdist runs, and test-order dependencies. Isolate state, expose order bugs, and detect flakiness over runs.

İbrahim Süren

Founder · Jun 25, 2026 · 5 min read

pytest Flaky Tests: Fixing Flakiness with xdist, Fixtures & Retries (2026)

pytest tests flake mostly from shared fixture or module state, parallel execution under pytest-xdist exposing race conditions, and hidden test-order dependencies. Fix it by isolating fixture state per test, making tests order-independent (pytest-randomly helps expose bugs), and detecting which tests flake from JUnit-XML history rather than a single run.

Key takeaways

Shared fixture/module state is the top cause — broadly-scoped fixtures leak data between tests.
pytest-xdist runs tests in parallel workers, exposing races and shared-resource collisions.
Hidden test-order dependencies surface under parallelism; pytest-randomly helps expose them.
pytest-rerunfailures can retry flaky tests, but it hides the signal — use it sparingly.
Detect flakiness from JUnit-XML history across runs, not from one execution.

pytest flakiness has a distinct flavor from browser-test flakiness: less about animations and timing, more about state — fixtures that leak, parallel workers that collide, and tests that secretly depend on running in a certain order. This guide covers those causes and how to fix and detect them. For the framework-agnostic view, see the complete guide to flaky tests. Where we reference Qualflare, we describe only what it actually does.

Why pytest tests flake

The recurring causes, often surfacing only in CI versus local:

Shared fixture state — a session- or module-scoped fixture that mutates and leaks data into later tests.
Parallel execution under pytest-xdist — workers contending for the same database, temp files, or ports.
Test-order dependencies — a test that only passes if another ran first.

All three are forms of non-determinism: the result depends on state the test doesn’t fully control.

The fixes

Isolate fixture state

Scope fixtures as narrowly as correctness allows. A function-scoped fixture rebuilds clean state for each test; a session-scoped one is shared, so any mutation bleeds across tests. If you need an expensive shared resource, make it read-only, or give each test its own derived data on top of it.

# Risky: one shared dict mutated across the whole module
@pytest.fixture(scope="module")
def cart(): return {}

# Safe: fresh state per test
@pytest.fixture
def cart(): return {}

Make tests safe for pytest-xdist

Under pytest -n auto, tests run in parallel worker processes. Anything shared — a fixed database row, a temp file path, a hardcoded port — becomes a race. Give each test isolated resources: use pytest’s tmp_path for files, unique identifiers for data, and per-worker databases. xdist exposes PYTEST_XDIST_WORKER if you need worker-specific resources.

Expose order dependencies

A suite that passes in file order but fails when shuffled has an order dependency hiding in it. pytest-randomly randomizes order each run, surfacing those bugs early instead of letting them flake intermittently in CI. Once exposed, fix them by making each test set up everything it needs.

Retries: a stopgap, not a cure

pytest-rerunfailures adds retries (pytest --reruns 2). It keeps CI unblocked, but a test that only passes on rerun is still a flaky test. Use it as a temporary bridge while you fix the underlying state issue, and keep an eye on how many tests rely on it.

Detect flakiness over time

Flakiness scales with suite size — Google reported almost 16% of its tests show some flakiness (Google Testing Blog, 2016). A single run can’t tell you a test is flaky — you need history. Have pytest write JUnit XML on every CI run (pytest --junitxml=results.xml) and send it to an observability layer that scores each test’s flakiness from its pass/fail history, including across xdist runs. See pytest test reporting. Qualflare aggregates pytest results across runs, scores flakiness, and clusters related failures by root cause.

Start free with Qualflare — upload your pytest JUnit XML and get flaky scoring and a reliability trend on your own suite.

Flaky tests in other frameworks: Playwright, Cypress, and Jest & Vitest.

Frequently asked questions

Why are my pytest tests flaky?

The most common causes are shared state through broadly-scoped fixtures (session or module scope leaking data between tests), parallel execution under pytest-xdist exposing race conditions and shared-resource collisions, and hidden dependencies on test order. Isolating fixture state and making tests order-independent fixes most cases.

Why do pytest tests fail with pytest-xdist but pass without it?

pytest-xdist runs tests across multiple worker processes in parallel, so tests that share a database row, temp file, port, or global state collide non-deterministically — problems that stay hidden when tests run one at a time. Give each test isolated resources (unique temp dirs, per-test data) to fix it.

How do I retry flaky tests in pytest?

The pytest-rerunfailures plugin adds retries (for example pytest —reruns 2). It keeps CI unblocked, but a test that only passes on rerun is still flaky and may be hiding a real bug, so use it as a stopgap while you fix root causes — and track how often tests need a rerun.

How do I find which pytest tests are flaky?

Write JUnit XML (pytest —junitxml=results.xml) on every CI run and send it to a test observability tool that builds pass/fail history per test and scores flakiness. A single run can’t reveal flakiness — it only becomes visible across many runs.