This is Part 5 of “The Centaur’s Toolkit” series, where we explore practical strategies for human-AI collaboration in technical work.


A new AI coding tool launches every week. Actually, that’s an understatement. Multiple tools launch every week, each promising to revolutionize your workflow. Each one has a Twitter thread explaining why it’s better than what you’re using now.

If you tried to evaluate every new tool, you’d never write any code.

But ignoring the landscape entirely means missing genuine improvements. Somewhere between “chasing every shiny object” and “sticking with what I know” lies a sustainable approach to building your AI toolkit.

I’ve spent the last year evaluating dozens of AI development tools. Most didn’t stick. A few became essential. Here’s the framework I use to separate signal from noise.

The Tool Sprawl Trap

Before we talk about building a toolkit, let’s acknowledge the problem.

Tool sprawl is real. I know developers running Claude, ChatGPT, Copilot, Cursor, and three different specialized tools simultaneously. They switch between them constantly, never developing deep fluency with any single tool. They spend more time deciding which tool to use than actually using tools.

This isn’t productivity. It’s procrastination dressed as productivity.

The goal of a toolkit isn’t to have the most tools. It’s to have the right tools, configured well, with deep enough familiarity that using them is automatic.

A mechanic doesn’t carry every tool ever made. They carry the tools they actually need, maintained well, organized for quick access. The same principle applies to your AI toolkit.

The Evaluation Framework

When a new tool appears, I evaluate it across five dimensions before investing time in adoption.

1. Workflow Fit

Does this tool fit how I actually work? Not how I might work in theory, but how I work today?

Questions to ask:

  • Does it integrate with my existing editor/IDE?
  • Does it work with my languages and frameworks?
  • Can I use it without changing my core workflow?
  • Does it complement my existing tools or compete with them?

Red flag: Any tool that requires restructuring your entire workflow to use. The friction of adoption should be proportional to the benefit gained.

I tried a tool last year that promised amazing code generation. But it required using their proprietary IDE, separate from my VS Code setup. The context switching cost killed any productivity gains. The tool might be excellent, but it doesn’t fit how I work.

2. Incremental Value

What does this tool provide that my current tools don’t?

This is harder to evaluate than it sounds. Many tools offer similar capabilities with minor variations. “5% better code completion” isn’t worth the switching cost. “Entirely new capability I can’t replicate otherwise” might be.

Framework for comparison:

Current capability: Claude handles my code review discussions
New capability:     New tool offers visual diff integration
Delta:              Slightly faster context, but same core capability
Switching cost:     New keybindings, different prompt patterns, 2 weeks to build fluency

In this example, the delta is small and the switching cost is high, so I’d skip it. If the delta is significant and the switching cost is low, try it.

3. Trust and Reliability

AI tools that work 90% of the time aren’t tools. They’re distractions. Every time a tool fails, you lose time debugging its output plus the context switch of going manual.

Trust indicators:

  • Consistent output quality (not amazing sometimes, terrible other times)
  • Honest about limitations (tools that never say “I don’t know” are lying)
  • Graceful degradation (works reasonably when optimal conditions aren’t met)
  • Active development and bug fixes

Red flag: Any tool where I regularly have to double-check its output against a more reliable source. At that point, why not just use the reliable source?

4. Long-term Viability

The AI tool landscape has a mortality rate. Tools that seemed promising six months ago are abandoned or acqui-hired. Building deep familiarity with a tool that might not exist next year is a risk.

Viability signals:

  • Sustainable business model (not just burning VC money)
  • Active user community (not just Twitter hype)
  • Responsive development team (bugs get fixed, features ship)
  • Clear roadmap (direction, not just promises)

I’ve been burned twice by tools that were excellent but discontinued. Now I factor viability into adoption decisions.

5. Privacy and Security

What happens to your code when you use this tool? This matters more than many developers acknowledge.

Questions to ask:

  • Is code sent to external servers?
  • How is that code stored and used?
  • Is code used to train models?
  • What’s the data retention policy?
  • Is there an enterprise/privacy-focused tier?

For personal projects, these concerns might be minimal. For client work or anything proprietary, they’re critical.

My Current Toolkit

After a year of evaluation and refinement, here’s what I actually use daily.

Primary: Claude (via Cursor)

Claude is my primary AI assistant. I use it through Cursor, which integrates it into my editor workflow.

Why Claude:

  • Best at understanding complex codebases
  • Excellent at explaining its reasoning
  • Honest about uncertainty (crucial for trust)
  • Strong at longer-form technical writing

What I use it for:

  • Architecture discussions (Strategist mode)
  • Code generation with refinement (Editor mode)
  • Debugging conversations (Debugger mode)
  • Understanding unfamiliar code (Learner mode)

Secondary: GitHub Copilot

Copilot handles inline completions while I type. It’s a different use case than conversational AI.

Why Copilot:

  • Lowest friction completions (just tab)
  • Trained specifically on code
  • Excellent for boilerplate and patterns
  • Integrates seamlessly with VS Code

What I use it for:

  • Completing routine code patterns
  • Generating test cases from function signatures
  • Writing repetitive CRUD operations
  • Suggesting variable and function names

Specialized: ChatGPT with specific GPTs

For some tasks, I use ChatGPT with custom GPTs configured for specific purposes.

My active GPTs:

  • API documentation researcher (configured with specific sources)
  • SQL query optimizer (focused on PostgreSQL patterns)
  • Technical writing editor (configured with my style preferences)

Why specialized GPTs:

  • Pre-configured context reduces prompt overhead
  • Consistent output for repeated task types
  • Can encode project-specific knowledge

What I Don’t Use

Notably absent from my toolkit:

Multiple general-purpose chatbots. I don’t maintain both Claude and ChatGPT for general use. They’re similar enough that maintaining fluency with both isn’t worth it. I picked one and went deep.

Tool-specific AI features. Many tools are adding AI features as checkboxes. GitHub has AI. Notion has AI. Slack has AI. I don’t use most of these. They’re often shallow integrations that don’t provide genuine value.

Bleeding-edge experiments. I evaluate new tools quarterly, not continuously. Between evaluations, I focus on mastering what I have rather than chasing what’s new.

Building Your Own Toolkit

Here’s my recommended approach for building a personal AI toolkit.

Start with One Tool

Pick a general-purpose AI assistant and commit to learning it well. Claude, ChatGPT, Gemini; any of the majors will work. The specific choice matters less than depth of usage.

Spend a month using only that tool. Learn its strengths and weaknesses. Develop prompts that work reliably. Build intuition for what it handles well and where it struggles.

Don’t add a second tool until you’ve hit the limits of the first.

Add Tools to Fill Gaps

After a month, you’ll have a clear sense of gaps. Maybe your primary tool is great at conversation but friction-heavy for quick completions. Maybe it struggles with a specific language or domain.

Add tools specifically to address identified gaps. Not because they seem interesting, but because you have a concrete problem they solve.

Example gaps and solutions:

  • “I want instant completions while typing” → Copilot
  • “I need to query large codebases” → Cursor/Codebase-aware tools
  • “I do repetitive tasks with specific context” → Custom GPTs
  • “I need to generate images for documentation” → Specialized image tools

Evaluate Quarterly

Set a quarterly reminder to evaluate your toolkit. Questions to ask:

  • Am I actually using all these tools?
  • Have any tools degraded in quality?
  • Is there a new tool addressing a real gap?
  • Can I consolidate any overlapping tools?

This prevents both stagnation (never evaluating new options) and sprawl (continuously adding tools).

Prune Ruthlessly

If you haven’t used a tool in the last month, uninstall it. Keep your toolkit tight.

The tools you keep should be tools you use reflexively, without deciding whether to use them. If you’re making decisions about tool usage, you either have too many tools or haven’t developed sufficient fluency.

Anti-Patterns to Avoid

The collector. Installing every new tool “just to try it” without actually integrating anything into your workflow. You end up with shallow knowledge of many tools and deep knowledge of none.

The loyalist. Refusing to evaluate new tools because your current setup works. Works isn’t the standard. Works better than alternatives is the standard. Evaluate periodically.

The optimizer. Spending more time configuring and customizing tools than using them for actual work. Configuration is procrastination in disguise.

The early adopter. Jumping on every new release, dealing with bugs and instability for the privilege of being first. Let tools stabilize before investing learning time. This connects to knowing when to trust AI output; tools that are too new haven’t proven their reliability.

The Centaur Toolkit Philosophy

Your AI toolkit should amplify your capabilities, not replace your judgment about what tools to use.

The Centaur approach to tooling:

Depth over breadth. Know your primary tools deeply. Quick familiarity with many tools is less valuable than automatic fluency with few.

Fit over features. A tool that fits your workflow beats a tool with more features that requires workflow changes.

Sustainability over novelty. Tools you’ll use for years beat tools that are exciting for weeks.

Judgment over automation. You decide what tools to use, when to use them, and when to work without them. The tools don’t decide.

Your toolkit will evolve. New tools will emerge that genuinely improve on current options. But evolution should be deliberate, not reactive. Evaluate from a position of competence with your current tools, not dissatisfaction with them.


What does your AI toolkit look like? Are you deep on one tool or spread across many? I’d like to hear what’s actually working in your day-to-day development. Find me on X or LinkedIn.


What’s Next

With a solid toolkit in place, how do you apply it to specific challenges? One of the most impactful applications is documentation.

Writing and maintaining documentation is notoriously difficult. AI changes the economics entirely. But it also introduces new pitfalls. Can you trust AI-generated documentation? How do you keep it current? What’s the human role?

Next in the series: The Documentation Problem: How AI Changes Technical Writing →


The toolkit framework in this post is part of a larger system I’ve developed for working effectively with AI. For the complete methodology, including evaluation templates and integration strategies, check out The Centaur’s Edge: A Practical Guide to Thriving in the Age of AI.