A failing test tells you something broke. It doesn't always tell you what, why, or how to fix it. Diagnosing CI failures is one of the most time-consuming parts of maintaining a test suite — and it's where AI provides some of the clearest productivity gains.
The Traditional Failure Diagnosis Loop
Without AI, diagnosing a failing test looks like:
- CI alerts that 3 tests failed
- Download the CI log (often hundreds of lines)
- Find the relevant error in the log
- Read the stack trace and try to understand where in the code it points
- Pull the branch and reproduce locally
- Trace through the code to find the cause
- Fix and push
Steps 2-6 can take 30-60 minutes for a non-obvious failure.
AI-Accelerated Failure Diagnosis
With AI, the same process looks like:
- CI alerts that 3 tests failed
- Copy the failure output (error message + stack trace + recent code changes)
- Paste into Claude or Copilot Chat with "Why is this test failing?"
- Receive a diagnosis and likely fix in seconds
- Verify, apply, push
This works because AI models have seen thousands of similar error patterns and can recognise common failure modes quickly.
What to Include in the AI Prompt
The more context you give, the better the diagnosis:
This Playwright test is failing in CI. Here is the error:
[paste error message and stack trace]
Here is the test code:
[paste test]
Here is the component that the test interacts with:
[paste component]
Recent changes to this component:
[paste diff]
What is causing the failure and how should I fix it?With this context, AI can identify:
- Race conditions where the test asserts before the UI updates
- Selector mismatches after a component refactor
- Environment differences between local and CI (missing env vars, different data state)
- Timing issues with async operations
Using Playwright's Trace Viewer with AI
Playwright generates a detailed trace file for each test run — a step-by-step visual replay showing the DOM state, network calls, and screenshots at every action.
Download the trace from CI and open it with playwright show-trace trace.zip. Find the step where the failure occurred and take a screenshot of the trace viewer showing the DOM state.
Include that screenshot in your AI prompt. Seeing the actual DOM state at the point of failure gives AI the visual context to identify why a selector failed or why an assertion didn't match.
Building a Failure Triage Process
For teams running large test suites, establish a triage process:
- Flaky tests (pass sometimes, fail sometimes) — likely timing or race condition issues; ask AI to identify patterns across multiple failure logs
- Consistent failures after a PR — likely a real regression; check the diff and ask AI to identify which change caused it
- Failures only in CI, not locally — likely environment issues; ask AI to compare the CI configuration with local setup
AI doesn't replace the human judgment about which failures matter and which are noise. It accelerates the process of getting to that judgment.