Tool Calling: How AI Agents Act on the World

How AI agents use tools — function calling, tool schemas, and the call-and-observe loop. A plain-English guide to how a language model turns words into real actions like searches, API calls, and code.

We've built an agent that can plan and remember. But so far it can only talk about doing things. The chapter on what an agent is promised it could act — search, book, refund, deploy. Tool calling is the mechanism that makes that real, and it's simpler than it sounds.

A model can only produce text

Start from the limit. An LLM's only output is text. It cannot click a button, hit an API, or read a database. So how does an agent search the web?

The trick: the model outputs structured text describing an action, and your surrounding code interprets that text and performs the action. The model says "call web_search with query 'Tokyo flights'" — and your code actually runs the search and hands back the results.

Model emits a tool call

Your code validates it

Your code runs the real action

Result returned to model

The model never executes anything — it requests; your code runs

This is usually called function calling or tool use, and it's the same idea wherever you meet it. The model proposes; your code disposes.

What a tool call looks like

Concretely, when an agent decides to act, it emits something like this instead of a normal reply:

{
  "tool": "get_weather",
  "arguments": { "city": "Tokyo", "unit": "celsius" }
}

Your code reads that, calls the real weather API, and returns the result as the next observation:

{ "temp": 22, "condition": "clear" }

That observation goes back into the context, the loop continues, and the model decides what to do next. The model is the brain; tools are the hands.

Tool schemas: teaching the model what exists

For the model to call a tool correctly, it has to know the tool exists and how to use it. That's done with a schema — a description given to the model alongside the prompt:

Field	Purpose	Example
Name	The identifier the model emits	`issue_refund`
Description	When and why to use it	"Refund a completed order."
Parameters	Typed arguments it accepts	`order_id: string`

The description matters enormously — it's how the model decides whether this tool fits the situation. A vague description ("does refund stuff") gets the tool called at the wrong times; a precise one ("Refund a shipped order; do not use for cancellations") guides the model to the right choice. Tool schemas are prompt engineering by another name.

Never trust a tool call

Here's the part beginners skip and regret. The model's tool calls are predictions, and predictions can be wrong. A model will cheerfully:

invent an argument that doesn't exist;
call delete_user when you meant it to call get_user;
pass a string where a number belongs.

This is the single most important safety boundary in an agent. The model's freedom to choose actions is exactly why agents are useful and exactly why an unvalidated tool layer is dangerous.

Fewer, sharper tools

It's tempting to give an agent dozens of tools. Resist it. The more tools you expose, the harder the model's choice becomes, and the more often it picks the wrong one or fumbles the arguments.

3 tools

95%

8 tools

88%

20 tools

72%

50 tools

55%

Tool-selection accuracy tends to fall as the toolset grows

(Illustrative numbers — the exact figures depend on the model and tools, but the direction is real.) The fixes are to keep tools few, distinct, and clearly named, and — when you genuinely need many — to retrieve only the relevant subset for each task, the same way you retrieve memories.

Tools are the boundary of the agent

Step back and notice what this means. An agent's planning and memory are only useful if it can act, and it can only act through its tools. The set of tools you give an agent defines the entire universe of what it can do. Want it to manage cloud infrastructure? Give it infra tools. Want it kept safe? Withhold the dangerous ones. Capability and safety are both decided here.

Recap

A model can only output text; a tool turns a chosen piece of that text into a real action.
The model emits a structured call; your code validates and executes it, then returns the result.
Tool schemas (name, description, typed parameters) teach the model what's available — descriptions especially guide the choice.
Validate every call — models hallucinate arguments and pick wrong tools; your code is the safety boundary.
Keep tools few and distinct; selection accuracy falls as the toolset grows.

A single agent with tools is powerful. But some problems are better solved by several agents working together. Continue to Multi-Agent Systems.