This is Part 4 of “The Centaur’s Toolkit” series, the finale. We’ve covered collaboration fundamentals, security applications, and calibrating trust. Now we tackle the question that ties it all together: is this actually working?
You’ve been using AI as your coding partner for three months. You feel faster. More productive. Like you’re getting more done in less time.
But feelings lie.
That sense of productivity might be real. Or it might be the satisfaction of constant activity masking the fact that you’re shipping buggier code, accumulating technical debt, or slowly losing skills you used to have.
Without measurement, you can’t know. And if you can’t know, you can’t improve.
This post is about measuring what matters in human-AI collaboration. Not vanity metrics that make you feel good, but honest assessment of whether your Centaur partnership is actually paying off.
The Measurement Problem
Measuring developer productivity is notoriously difficult. Adding AI to the mix makes it even harder.
The obvious metrics are useless or misleading:
Lines of code tells you nothing. AI can generate hundreds of lines that do what ten lines should. More code often means more bugs, more maintenance, and more complexity.
Time saved requires knowing how long something “would have taken,” which is unknowable. We systematically overestimate this, remembering our slowest manual experiences and comparing them to our fastest AI-assisted ones.
Suggestions accepted measures AI usage, not AI value. Accepting more suggestions isn’t better if those suggestions are mediocre.
Gut feeling is biased toward feeling productive. The dopamine hit of rapid code generation feels like accomplishment, even when the code doesn’t work.
We need better metrics. And more importantly, we need a framework for thinking about what “better” even means.
The Three Dimensions of Effectiveness
Effective AI collaboration improves three things simultaneously:
- Speed - Are you shipping faster?
- Quality - Are you shipping better?
- Learning - Are you growing as a developer?
Most developers focus only on speed. This is a mistake. Speed without quality creates technical debt. Speed without learning creates dependency.
Let’s examine each dimension.
Speed Metrics That Matter
Speed isn’t just about typing faster. It’s about reducing the time from “I need to build X” to “X is working in production.”
Time to Working Code
Measure the elapsed time from starting a task to having working, tested code. Not “code that compiles,” but code that actually does what it’s supposed to do.
Track this for similar-sized tasks over time. Are you getting faster? Is the improvement sustained or was it just initial novelty?
Be honest about what counts as “working.” If you’re declaring victory earlier but spending more time on bugs later, you haven’t actually improved.
Iteration Cycles
How many back-and-forth cycles does it take to get something right?
With effective AI collaboration, you might need fewer iterations because:
- AI catches errors you would have made
- AI suggests edge cases you would have missed
- AI helps you think through the design before coding
With ineffective AI collaboration, you might need more iterations because:
- AI-generated code has subtle bugs
- You accepted suggestions without understanding them
- The AI’s approach doesn’t fit your architecture
Count your iterations. Fewer is better, but only if the end result is good.
Unblocking Time
How long do you stay stuck before making progress?
This is where AI often provides the most value. Instead of spending 30 minutes searching Stack Overflow or reading documentation, you get an answer in seconds.
Track how often you get stuck and for how long. If AI is working well, your “stuck time” should decrease. If you’re getting wrong answers from AI and chasing them down rabbit holes, it might actually increase.
Quality Metrics That Matter
Speed means nothing if you’re shipping garbage faster. Quality metrics keep you honest.
Bug Rate
Are you introducing more or fewer bugs with AI assistance?
This is hard to measure precisely, but you can approximate:
- Track bugs found in code review
- Track bugs found in testing
- Track bugs that make it to production
Compare AI-assisted work to your historical baseline. If your bug rate is increasing, something is wrong with your verification process.
Code Review Feedback
What does your team say about your AI-assisted code?
Pay attention to patterns in review comments:
- “This is more complex than it needs to be” (AI over-engineering)
- “This doesn’t match our conventions” (AI doesn’t know your codebase)
- “What does this section do?” (you don’t understand your own code)
- “Nice, this is clean” (AI is helping)
If reviewers are consistently pointing out issues in AI-assisted code, adjust your workflow.
Test Coverage
Is your AI-assisted code well-tested?
AI often generates code without corresponding tests. Or it generates tests that don’t actually test meaningful behavior. Track:
- Do you write tests for AI-generated code?
- Are those tests meaningful or just coverage theater?
- Does the AI help you think of edge cases to test?
Technical Debt
Are you accumulating cruft faster than before?
AI can accelerate technical debt by:
- Generating verbose code when concise code would work
- Using patterns that don’t fit your architecture
- Creating duplication instead of reusing existing abstractions
Pay attention to whether your codebase is getting cleaner or messier over time. If every AI-assisted feature leaves behind cleanup work, factor that into your effectiveness assessment.
Learning Metrics That Matter
This is the dimension most developers ignore, and it’s arguably the most important for long-term career health.
Concepts Understood
When AI helps you with unfamiliar territory, do you actually learn?
The test is simple: could you explain this code to someone else? Could you write something similar without AI help?
If you’re using AI in Learner mode (as we discussed in Part 1), you should be accumulating knowledge. Track:
- New technologies you’ve become comfortable with
- Patterns you’ve internalized
- Concepts you now understand deeply
If you’ve been using AI for months and still can’t do basic tasks without it, that’s a warning sign.
Decreasing Repeat Questions
You shouldn’t need to ask AI the same question twice.
If you find yourself asking “how do I do X” for the same X repeatedly, you’re not learning. You’re outsourcing.
Notice your patterns. Are there topics where you’ve graduated from needing AI help? Or are you perpetually dependent?
Confidence in Unfamiliar Domains
Over time, you should become more confident tackling new areas.
Not because AI will always be there to help, but because you’ve built meta-skills:
- Knowing how to explore a new codebase
- Knowing what questions to ask
- Knowing how to verify unfamiliar information
If you feel less confident without AI than you did before you started using it, something has gone wrong.
The Centaur Scorecard
Here’s a simple framework for weekly self-assessment. Take five minutes every Friday to score yourself on each dimension.
Speed Score (1-5)
Did I ship faster this week?
- 5: Significantly faster than my pre-AI baseline
- 4: Noticeably faster
- 3: About the same
- 2: Slower due to AI-related issues (bad suggestions, rabbit holes)
- 1: Much slower; AI is currently a net negative
Quality Score (1-5)
Did I ship better this week?
- 5: Fewer bugs, cleaner code than my baseline
- 4: Good quality, maybe slightly better than usual
- 3: Same quality as always
- 2: More issues than usual; some AI-related
- 1: Significant quality problems; AI is hurting my output
Learning Score (1-5)
Did I grow this week?
- 5: Learned significant new concepts I’ll retain
- 4: Picked up some useful knowledge
- 3: No particular growth, but no regression
- 2: Felt like I’m forgetting things; over-relying on AI
- 1: Actively losing skills; completely dependent
Sustainability Score (1-5)
Am I building skills or dependencies?
- 5: Becoming a stronger developer who uses AI as a tool
- 4: Mostly building skills, with healthy AI integration
- 3: Neutral; maintaining current skill level
- 2: Starting to feel dependent on AI for basic tasks
- 1: Couldn’t function without AI; losing fundamental skills
Interpreting Your Scores
16-20 total: Excellent. Your Centaur partnership is working well. Keep doing what you’re doing.
12-15 total: Good, but room for improvement. Look at your lowest dimension and focus there.
8-11 total: Caution. You’re getting some value but may be developing bad habits. Reassess your workflow.
Below 8: Warning. AI may be hurting more than helping right now. Consider reducing usage and rebuilding fundamentals.
The key is tracking over time. A single week’s score matters less than the trend. Are you improving? Stable? Declining?
Honest Self-Assessment Techniques
Scoring yourself honestly is hard. Here are techniques to keep yourself accountable.
The “Explain It” Test
After finishing AI-assisted work, explain it out loud. To a rubber duck, a colleague, or an empty room.
Can you articulate:
- What the code does?
- Why it’s designed this way?
- What alternatives exist?
- Where the edge cases are?
If you can’t explain it, you don’t understand it. And if you don’t understand it, you’ll struggle to maintain it.
Before/After Journaling
Keep brief notes on AI-assisted tasks:
- What did I ask AI to help with?
- What did I accept vs. modify vs. reject?
- What did I learn?
- What surprised me?
Review these notes monthly. Patterns will emerge that you wouldn’t notice in the moment.
The Unplug Experiment
Periodically, do a task without AI assistance.
Not as punishment, but as calibration. How does it feel? Are you slower but more thoughtful? Faster than you expected? Completely lost?
This reveals your true skill level versus your AI-augmented skill level. Both matter, but you should know the difference.
Peer Calibration
If you have colleagues also using AI, compare experiences:
- What’s working for them?
- What problems are they encountering?
- How do their quality metrics compare?
This isn’t about competition. It’s about expanding your sample size beyond just yourself.
Course Correction
Your assessment might reveal problems. Here’s how to address them.
Signs You’re Over-Relying on AI
- Can’t write basic code without AI assistance
- Frequently accepting code you don’t understand
- Bug rate increasing despite feeling productive
- Learning score consistently low
- Anxiety about AI tools being unavailable
Correction: Deliberately practice without AI. Start with small tasks and rebuild confidence in your fundamental skills. Use AI as a learning tool rather than a doing tool.
Signs You’re Under-Utilizing AI
- Spending long periods stuck on solvable problems
- Reinventing solutions that AI could provide instantly
- Not using AI for code review or second opinions
- Speed score consistently at baseline despite AI availability
Correction: Identify specific friction points in your workflow. Experiment with AI assistance for those specific areas. Start with Strategist mode rather than having AI write code.
Finding Your Balance
The right amount of AI assistance varies by:
- Your experience level
- The type of work
- The stakes involved
- Your learning goals
Someone learning a new language should use more Learner mode and less blind acceptance. Someone doing routine work in a familiar domain can lean more heavily on AI. Someone working on critical systems should verify more thoroughly.
There’s no universal right answer. But there’s a right answer for you, right now, on this task. The Centaur Scorecard helps you find it.
The Ongoing Practice
Becoming an effective Centaur isn’t a destination. It’s an ongoing practice.
The AI tools will evolve. Your skills will develop. The right balance will shift. What works today might not work in six months.
The developers who thrive will be those who:
- Continuously assess their effectiveness
- Adjust their collaboration patterns based on evidence
- Maintain core skills while leveraging AI amplification
- Treat AI as a powerful tool, not a replacement for thinking
This is the Centaur advantage: not just using AI, but using it well. Knowing when to lean in and when to pull back. Measuring what matters. Improving deliberately.
Series Conclusion
Over these four posts, we’ve covered the complete Centaur framework:
-
The Four Collaboration Modes: Strategist, Editor, Debugger, and Learner. Different ways to work with AI depending on your goal.
-
Security Applications: Applying human-in-the-loop principles to high-stakes work where AI accelerates but humans decide.
-
Trust Calibration: Knowing when to verify, when to accept, and how to build intuition for AI reliability.
-
Measurement and Assessment: The Centaur Scorecard and honest self-evaluation to ensure your AI collaboration is actually working.
These aren’t just techniques. They’re a mindset shift. From passive AI consumption to active AI collaboration. From feeling productive to being productive. From using AI to partnering with it.
The age of AI is here. The question isn’t whether to use these tools. It’s whether you’ll use them well.
You have the framework. Now the practice begins.
This series is based on the core principles from my book, The Centaur’s Edge: A Practical Guide to Thriving in the Age of AI. The book goes deeper into each concept, with additional exercises, case studies, and the complete Mental Gymnasium for building your AI collaboration skills. If this series resonated with you, the book is your next step.