Playwright Flaky Tests in CI: Causes & Fixes (2026)

Playwright tests flake in CI from timing on slow runners, parallel workers, and manual waits. Fix them with web-first assertions, isolation, and retries.

İbrahim Süren

Founder · Jun 25, 2026 · 5 min read

Playwright Flaky Tests in CI: Causes & Fixes (2026)

Playwright tests flake in CI mostly from timing on slower runners, manual waitForTimeout calls, and state shared across parallel workers. Fix it by using web-first assertions (which auto-retry), removing fixed waits, isolating state per worker, and configuring retries with trace-on-first-retry to diagnose what's left.

Key takeaways

Playwright auto-waits, so most flakiness comes from bypassing it with manual waitForTimeout or non-retrying assertions.
Use web-first assertions (expect(locator).toBeVisible()) — they auto-retry until the condition holds.
Parallel workers expose shared state; isolate per-test data and avoid cross-test dependencies.
Set retries in CI and enable trace: 'on-first-retry' to capture exactly what flaked.
A test that passes on retry is still flaky — track it from history, don't just let retries hide it.

Playwright is built to avoid flakiness — its auto-waiting and web-first assertions handle most timing for you, a big part of why it overtook Cypress in usage and led E2E tools on retention in the State of JS 2024 survey. So when Playwright tests still flake in CI but pass on your laptop, it’s usually because something bypassed those guarantees. This guide covers the common causes and their fixes. For the framework-agnostic picture, see the complete guide to flaky tests. Where we reference Qualflare, we describe only what it actually does.

Why Playwright tests flake in CI but not locally

The root cause is almost always an environment difference CI exposes — the general pass-locally-fail-in-CI pattern. For Playwright specifically, three things dominate:

Slower, busier runners turn “fast enough locally” into a race in CI.
Parallel workers run tests concurrently; tests sharing state, storage, or a user collide.
A clean environment removes the cached auth, data, or timing assumptions your local runs quietly relied on.

All three are forms of non-determinism — the result depends on conditions CI doesn’t reproduce identically.

The common causes — and fixes

Manual waits instead of web-first assertions

The single biggest cause. await page.waitForTimeout(2000) guesses; on a slow runner, 2000ms isn’t enough. Playwright’s web-first assertions auto-retry until the condition is true or the timeout expires:

// Flaky: a fixed guess
await page.waitForTimeout(2000);
expect(await page.locator('.result').isVisible()).toBe(true);

// Stable: auto-retries until visible
await expect(page.locator('.result')).toBeVisible();

Prefer expect(locator).toBeVisible(), toHaveText(), toHaveCount() and friends — they all poll. Reserve waitForTimeout for debugging only.

Waiting on time instead of the event

If an action triggers a network request, wait for the response, not a duration: await page.waitForResponse(...) or assert on the UI state the response produces. This removes the timing guess entirely.

Shared state across parallel workers

Playwright already distributes test files across worker processes; fullyParallel: true additionally runs the tests within a file in parallel. Either way, any test that depends on a shared account, a fixed database row, or another test having run first will flake non-deterministically. Give each test its own data, use test.beforeEach for clean setup, and avoid ordering assumptions. If tests genuinely must run in order, scope them with test.describe.serial — but treat that as a last resort.

Animations and transitions

Elements that animate in can be “present” but not yet stable. Web-first assertions handle most of this; for the rest, disable animations in test config or wait for the post-animation state.

Configure retries and capture traces

In CI, set retries and turn on traces so a flake is diagnosable instead of a mystery:

// playwright.config.ts
export default defineConfig({
  retries: process.env.CI ? 2 : 0,
  use: { trace: 'on-first-retry' },
});

trace: 'on-first-retry' records a full trace only when a test retries — exactly the runs you need to debug, with no overhead on green runs.

Don’t let retries hide the problem

Flakiness is the norm at scale — Google found almost 16% of its tests show some flakiness (Google Testing Blog, 2016). Retries keep CI moving, but a test that only passes on the second attempt is still a flaky test. Playwright flags these within a run, but to see which tests flake over time — and whether your fixes worked — you need history across runs. Send Playwright’s JSON or blob results to an observability layer that scores each test’s reliability: see Playwright test reporting. Qualflare scores flakiness from Playwright’s retry data and tracks a 90-day trend, so the worst offenders are obvious and verified-fixed.

Start free with Qualflare — upload your Playwright results and get flaky scoring and failure clustering on your own suite.

Flaky tests in other frameworks: Cypress, pytest, and Jest & Vitest.

Frequently asked questions

Why do Playwright tests pass locally but fail in CI?

CI runners are slower and more heavily loaded, so timing-sensitive steps that always finish in time locally don’t. Combined with parallel workers and a clean environment, this exposes manual waits, race conditions, and shared state. Using web-first assertions that auto-retry and removing fixed waits resolves most cases.

How do I fix flaky Playwright tests?

Replace fixed waits (waitForTimeout) with web-first assertions like expect(locator).toBeVisible() that auto-retry, wait on specific conditions or network responses, isolate state so parallel workers don’t collide, and use Playwright’s retries with trace-on-first-retry to diagnose the rest.

Does Playwright mark flaky tests automatically?

Yes. When a test fails and then passes on a retry within the same run, Playwright reports it as flaky. That per-run signal is useful, but to see which tests flake over time you need to keep history across runs — which is what a test observability tool adds on top of Playwright’s reporters.

Should I use retries to handle Playwright flakiness?

Use limited retries to keep CI unblocked, but treat them as a signal, not a fix. A test that only passes on retry is still flaky and may be hiding a real race condition. Track how often tests need a retry and fix the root cause.