This is Part 4 of “The Centaur’s Toolkit” series, where we explore practical strategies for human-AI collaboration in technical work.


Code review is one of those practices everyone agrees is important and almost nobody enjoys. You’re deep in your own work when a PR notification arrives. You context-switch, try to understand someone else’s mental model, and attempt to provide useful feedback before the next interruption hits.

It’s mentally draining. And honestly, most reviews end up being superficial because thorough review takes time nobody has.

This is where AI assistance gets genuinely interesting. Not as a replacement for human review, but as a first pass that catches the obvious stuff so humans can focus on what actually requires human judgment.

I’ve been experimenting with AI-assisted code review for months now, both as a reviewer and as someone submitting code. Here’s what I’ve learned about making it work.

The Problem with Code Review

Let me be direct about why traditional code review often fails.

Time pressure. Most developers have 15 minutes, maybe 30, to review a PR. That’s not enough time to truly understand complex changes. So reviews become cursory. “Looks good to me” becomes the default.

Context switching cost. The mental overhead of loading someone else’s problem into your head, understanding their approach, and then returning to your own work is significant. Research by Gloria Mark at UC Irvine found that it takes an average of 23 minutes to refocus after an interruption.

Inconsistent standards. What one reviewer catches, another misses. Style preferences vary. Some reviewers focus on architecture; others nitpick formatting. The quality of review depends heavily on who happens to be available.

Reviewer fatigue. After reviewing three or four PRs, attention degrades. That fifth review? It’s probably not thorough.

AI doesn’t solve all of these problems. But it can address the mechanical aspects of review, freeing human reviewers to focus on what machines genuinely can’t evaluate.

What AI Can and Can’t Review

Before building a framework, we need to be clear about the boundaries.

AI is Good At:

Pattern matching against known issues. Off-by-one errors, null pointer risks, common security vulnerabilities. AI has seen millions of code samples and can recognize patterns that indicate problems.

Consistency checking. Does this code follow the project’s established patterns? Are naming conventions consistent? Is error handling structured the same way throughout?

Documentation gaps. Missing docstrings, unclear function names, complex code that lacks explanation.

Test coverage suggestions. What edge cases aren’t tested? What branches lack coverage?

Dependency concerns. Is this library maintained? Are there known vulnerabilities in this version?

AI is Bad At:

Evaluating design decisions. Should this be a microservice or a module? AI can list tradeoffs, but it doesn’t know your team, your scale, or your maintenance capacity.

Understanding business context. Is this feature actually what the users need? Does this implementation align with product direction? AI has no idea.

Team dynamics. Is this PR from a junior developer who needs detailed feedback, or a senior who just wants a sanity check? Review tone should adapt.

Organizational knowledge. “We tried that approach in 2023 and it didn’t work because of X” is context AI simply doesn’t have.

Subjective quality judgments. Is this code elegant? Is it maintainable? These are human assessments that depend on human values.

The framework I use respects these boundaries explicitly.

The Three-Pass Review Framework

I structure AI-assisted code review as three distinct passes, each with a different purpose.

Pass 1: AI Mechanical Review

Before any human looks at the code, AI does a first pass focused entirely on mechanical concerns.

What I ask AI to check:

Review this code change for:
1. Potential bugs (null checks, off-by-one, race conditions)
2. Security issues (injection, authentication, data exposure)
3. Performance concerns (N+1 queries, unnecessary allocations)
4. Test coverage gaps
5. Consistency with [project style guide / existing patterns]

For each issue found:
- Severity (critical/warning/suggestion)
- Location (file and line)
- Explanation of the concern
- Suggested fix

Do not comment on:
- Design decisions
- Whether this feature should exist
- Architectural choices

The key is the exclusion list. I explicitly tell AI not to weigh in on areas that require human judgment. This prevents noise and keeps the AI feedback focused.

Example output:

## Critical

**auth_handler.py:47** - SQL query constructed with string
concatenation. Potential SQL injection vulnerability.
Suggested: Use parameterized query with cursor.execute(sql, params)

## Warning

**user_service.py:123** - Database query inside loop.
This will execute N queries for N items.
Suggested: Batch query with IN clause or prefetch related data

## Suggestion

**api_routes.py:89** - Function 'process_data' is 45 lines
with 6 levels of nesting. Consider extracting helper functions
for readability.

This pass typically takes 30-60 seconds. The developer submitting the PR can run it themselves before requesting human review, addressing mechanical issues proactively.

Pass 2: Human Architectural Review

With mechanical issues handled, human reviewers can focus on what actually matters: design, approach, and context.

Human reviewer questions:

  • Does this solve the right problem?
  • Is the approach appropriate for our scale and team?
  • Will future-me understand this in six months?
  • Does this introduce technical debt we’ll regret?
  • Are there organizational considerations the author might not know?

This is where experience and context matter. A senior developer might recognize that the current approach will cause problems when we add multi-tenancy next quarter. That’s not something AI can know.

The time saved on mechanical review lets humans be thorough on these questions. Instead of spending 20 minutes catching typos and style issues, you spend 20 minutes actually thinking about the design.

Pass 3: Collaborative Refinement

The final pass is a dialogue. Author and reviewer discuss the human feedback, potentially using AI to explore alternatives or validate concerns.

Example exchange:

Reviewer: I'm concerned about the approach to caching here.
If cache invalidation fails, users could see stale data
indefinitely.

Author: Good point. What if we add a TTL as a fallback?

[AI consultation]
Author: I asked Claude about cache invalidation patterns for
this use case. It suggested a "write-through with TTL backup"
approach. Here's the tradeoff analysis...

Reviewer: That makes sense. The TTL approach handles the failure
mode without overcomplicating the happy path. Ship it.

AI becomes a research tool in the conversation, not the decision-maker.

Practical Implementation

Here’s how I’ve implemented this in practice.

Pre-Review Checklist

Before requesting human review, I run the AI mechanical check myself. This catches embarrassing issues before anyone else sees them. It takes two minutes and saves reviewer time.

My pre-review prompt:

I'm about to submit this PR for review. Act as a thorough
but practical reviewer. Check for:

1. Obvious bugs or security issues
2. Violations of [our project's] coding standards
3. Missing error handling
4. Untested edge cases
5. Unclear code that needs comments

Be direct. I'd rather fix issues now than have a reviewer
catch them later.

Reviewer Workflow

When I’m reviewing others’ code, I start with a quick AI scan to orient myself:

Summarize this PR in 2-3 sentences:
- What is it trying to accomplish?
- What's the general approach?
- What files/systems are affected?

Then note any obvious concerns I should investigate further.

This gives me a map before I dive into the details. I can then focus my human attention on the areas that seem most consequential.

Team Integration

For team-wide adoption, I suggest:

Start with opt-in. Let developers who are interested try AI-assisted review. Forced adoption creates resentment.

Standardize prompts. Create a shared document of review prompts tailored to your codebase. “Check for our specific patterns” is more useful than generic review.

Don’t mandate AI-only first pass. Some PRs are simple enough that jumping straight to human review makes sense. Trust developer judgment.

Track outcomes. Are AI-flagged issues actually problems? Are we catching more bugs? Measure before declaring success.

Common Pitfalls

I’ve made these mistakes so you don’t have to.

Treating AI findings as absolute. AI flags something as a “potential issue.” That doesn’t mean it’s an actual issue. Context matters. A reviewer who blindly enforces every AI suggestion becomes annoying and often wrong. This connects to the broader question of when to trust AI output that I covered earlier in this series.

Over-automating. Fully automated review gates that block PRs based on AI analysis sound efficient. In practice, they create friction and false positives. AI should inform, not block.

Ignoring AI findings entirely. The opposite mistake. If AI flags a security concern, investigate it. Don’t dismiss it because “I know what I’m doing.” The best developers I know take AI feedback seriously while evaluating it critically.

Skipping human review. “AI said it’s fine” is not sufficient review for any non-trivial change. AI cannot evaluate whether this code should exist or whether it solves the right problem.

Real Results

In six months of using this framework, here’s what I’ve observed:

Time savings. My average review time dropped from 25 minutes to 15 minutes. The AI first pass catches the obvious stuff; I focus on what matters.

Consistency. My reviews are more thorough because I’m not fatigued by style nitpicks. I have mental energy for actual design review.

Earlier bug detection. Several potential production issues were caught in review that I might have missed in a quick scan. AI is better than tired humans at pattern-matching against known bug patterns.

Better PR quality. Developers running pre-review AI checks submit cleaner code. Issues get fixed before review, not during.

The framework isn’t perfect. Some AI suggestions are noise. Some real issues require human context that AI can’t provide. But overall, the combination of AI mechanical review and human judgment review is better than either alone.

The Centaur Approach to Quality

Code review is a microcosm of the broader Centaur philosophy. Neither human nor AI review is sufficient alone.

AI brings:

  • Tireless consistency
  • Pattern recognition across millions of examples
  • Speed on mechanical checks
  • No ego or fatigue

Humans bring:

  • Understanding of context and goals
  • Evaluation of design decisions
  • Organizational knowledge
  • Judgment about what matters

The Centaur reviewer uses both. AI handles the mechanical; humans handle the meaningful. The result is better than either could achieve alone.


What’s your experience with AI-assisted code review? Are you using it already? What catches you off guard vs. what never gets flagged? I’d like to hear what’s working (or not) in your workflow. Find me on X or LinkedIn.


What’s Next

Code review is one application of AI-assisted quality practices. But the tools you use matter as much as the practices.

In the next article, we’ll explore how to build your personal AI toolkit. Which tools actually provide value? How do you evaluate new ones? And how do you avoid tool sprawl while staying current?

Next in the series: Building Your Personal AI Toolkit: Tools That Actually Matter →


The three-pass review framework is one of several quality practices I cover in my book on human-AI collaboration. If you want the complete system for building AI-assisted workflows, check out The Centaur’s Edge: A Practical Guide to Thriving in the Age of AI.