Tools, not prompts: how I build AI agents that don't fall over

Most “AI agent” demos work beautifully on stage and collapse in production. The reason is almost always the same: the team poured their effort into one enormous prompt and hoped the model would behave. It won’t — not consistently, not at scale.

The fix isn’t a better prompt. It’s better tools.

What “tool-based” actually means

When I build an agent on Anthropic Claude, every capability is a small, typed tool with its own input schema: create_campaign, fetch_analytics, book_appointment. The model’s job shrinks to choosing which tool to call and with what arguments. Your code does the work, and your validation runs before anything happens.

This sounds like a minor architectural choice. It’s the whole game.

Why it works

Three things fall out of the tool-based design for free:

Validation lives in your code. Each tool checks its own inputs. The model can’t bypass a guard it doesn’t know exists.
Irreversible actions get a gate. Anything destructive or costly gets a confirmation step before the tool runs. The agent proposes; a human (or a rule) disposes.
Everything is logged. Tool calls are structured events, so you get an audit trail and the raw material for evals without extra plumbing.

MCP makes it scale

The Model Context Protocol (MCP) takes this one step further: it standardizes how tools are exposed to a model. Instead of bespoke glue for every integration, you expose your systems through one consistent interface. Adding the tenth tool is as clean as the first, and access stays auditable.

This is how Scalify’s in-app agent serves 23,000+ users: not a clever prompt, but a set of well-bounded tools the model can compose.

The takeaway

If you’re building an AI feature and you find yourself editing a 600-line prompt to fix behavior, stop. The behavior you want is probably a tool you haven’t defined yet. Design the tools, validate their inputs, gate the dangerous ones, and let the model do the one thing it’s genuinely good at: choosing.