Used well, AI shortens feedback loops. Used blindly, it creates review debt. Here is a sane split of responsibilities between you and the model.
High-leverage phases
Exploration and naming
Brainstorming module boundaries, API shapes, and variable names is cheap to regenerate. Treat output as disposable until you lock a design.
Boilerplate and scaffolding
CRUD routes, test stubs, config files, and migration skeletons are ideal — you still own correctness and security.
Refactors with mechanical rules
Renames, extracting functions, and applying a lint rule across files work when you specify invariants ("no behavior change").
Documentation that mirrors code
Summaries of public functions, README sections, and changelog entries — always spot-check against the actual implementation.
Phases that need extra caution
Security-sensitive code
Auth, cryptography, payment flows, and deserialization: assume the model does not know your threat model. Pair with human review and static analysis.
Novel algorithms
If complexity matters (graph algorithms, numeric stability), verify against references or formal proofs — do not trust "looks right."
Cross-team contracts
OpenAPI schemas, database migrations, and public SDKs — freeze interfaces with reviewers, then let AI fill implementations underneath.
Session hygiene
- Short sessions for focused tasks; start fresh when the thread drifts.
- Pin versions of dependencies in prompts when debugging ("we use Next 16.1, not 14").
- Keep a scratch file for experiments so production branches stay clean.
Measuring whether it helped
Track time to merge and defect rate, not vibes. If AI-generated PRs need more rounds of review than human-only ones, tighten prompts or reduce scope per request.
Key takeaways
- Delegate repetitive and exploratory work; own architecture and security.
- Treat AI output as code written by a fast, forgetful teammate — review accordingly.
- Process metrics beat feelings for tuning how much AI belongs in your workflow.