Skip to main content

Why AI-Generated Code Is a QA Problem

April 23, 2026

GitHub's data for early 2026 is striking: over 51% of all code committed is either generated or substantially assisted by AI. That number was essentially zero four years ago. We've crossed a threshold where AI is no longer a coding aid — it's a coding collaborator that writes more code than most of the humans on the team.

That's genuinely impressive. It's also a quality problem we haven't fully reckoned with.

The Speed-Quality Trade-off

AI coding tools optimise for getting to working code quickly. They're excellent at generating boilerplate, scaffolding CRUD operations, writing components that match a description, and filling in implementation details from a function signature. The code often works on the first try — for the happy path, on the developer's machine, in Chrome.

What AI doesn't optimise for:

  • Edge cases the developer didn't think to mention in the prompt
  • Error states and failure modes
  • Accessibility
  • Cross-browser compatibility
  • Performance under real load
  • Security edge cases in input handling

These aren't things AI can't handle — they're things AI doesn't handle unless you explicitly prompt for them. And most developers don't, because they're moving fast.

The Testing Gap

Traditional software development has a testing culture that developed over decades. Code reviews, unit tests, integration tests, QA cycles — these exist because experience taught the industry that untested code breaks in production.

AI-assisted development is compressing timelines so aggressively that these safeguards are being skipped. A solo founder using Lovable or Bolt can go from idea to live product in a weekend. There's no review cycle. There's no QA pass. There's often not even a second human who looked at the code.

The result: production code that works in the demo but fails for real users.

What AI-Generated Code Gets Wrong

After testing dozens of AI-built applications, the failure patterns are consistent.

Form validation. The AI generates a form with validation logic, but the error states are incomplete. The client-side validation fires, but if someone bypasses it (or if there's a network error on submit), the feedback is either missing or misleading.

Mobile layouts. AI tools generate responsive code, but they test it in a simulated viewport, not a real device. Safari on iOS, in particular, handles CSS and JavaScript differently enough that what looks fine in Chrome DevTools breaks on a real iPhone.

Empty and error states. The AI builds the happy path: user has data, API returns successfully, everything renders. It rarely builds the empty state (what does the dashboard look like with zero items?) or the error state (what happens when the API call fails?).

Authentication edge cases. Token expiry, concurrent sessions, back button after logout — these interactions require careful handling that AI doesn't generate without explicit prompting.

Performance. AI-generated React components often re-render more than necessary. AI-generated API calls often fetch more data than needed. On a fast connection with a small dataset, this is invisible. Under real conditions, it matters.

The Developer's Responsibility

This isn't an argument against using AI coding tools. The productivity gains are real. The question is what to do with the time they save.

The answer should be: test more thoroughly, not less.

When AI writes your boilerplate in seconds, you have time to write proper tests. When AI scaffolds your API layer in minutes, you have time to think through error handling. The tools give you leverage — but leverage only matters if you apply it to the right things.

Shift your focus. If AI is handling implementation, your highest-value contribution becomes specification and verification. Clearly specifying what the code should do (not just what it should look like), and then verifying that it actually does it — including edge cases, failure modes, and non-happy-path scenarios.

Test the output, not the process. It doesn't matter how the code was generated. If it's in production, it needs to meet the same quality bar as hand-written code. End-to-end tests don't care whether a function was written by a developer or generated by Claude — they test behaviour.

Treat AI like a junior developer. A good senior developer doesn't blindly merge a junior's PR without review. The same principle applies to AI-generated code. Review it, test it, question the edge cases.

What This Means for QA

The 51% number is going to keep rising. As AI coding tools get better, more code will be AI-generated. That makes QA more important, not less.

The role of QA is shifting from "find bugs in code" to "verify that AI-generated functionality actually meets requirements and handles real-world conditions." That's a broader mandate — and it requires automation.

Manual testing can't keep up with AI-generated code. You need automated end-to-end tests that can run on every deploy, catch regressions instantly, and verify the full user flow against real requirements.

The teams and individuals who figure this out — who pair fast AI development with solid automated testing — will ship better products faster than teams that just move fast and assume the AI got it right.

The AI got it mostly right. Mostly isn't enough for production.

Recommended Posts