How AI Agents Plan: Turning a Goal Into Steps

How AI agents break a goal into actions — ReAct, plan-and-execute, reflection, and chain-of-thought. A plain-English guide to the reasoning strategies that decide what an agent does next.

In the last chapter we said the hard part of the agent loop is the "think" step. A user gives a goal — "organize a launch for our new feature" — and somehow the agent has to convert that into concrete actions like create_calendar_event and send_email. That conversion is planning, and how an agent does it shapes everything about its behavior.

Why planning is hard

The gap is between two very different things:

A goal is abstract, often underspecified, and stated in human terms.
An action is concrete, exact, and machine-shaped — a specific tool with specific arguments.

Bridging that gap is what a model is doing whenever it "decides what to do next." There's no single right way to do it, so several planning strategies have emerged, each with a different trade-off between adapting to surprises and staying predictable.

ReAct: reason and act, interleaved

The most common pattern is ReAct — short for Reason + Act. Instead of planning everything upfront, the agent alternates between thinking and doing, one step at a time:

Thought

Action

Observation

Thought

Action

...

ReAct: each action is chosen using the result of the last one

A single ReAct step looks like this in the model's output:

Thought: I need the user's most recent order before I can refund it.
Action: get_orders(user_id="u_8812", limit=1)
Observation: { "order_id": "o_5520", "total": 49.0, "status": "shipped" }
Thought: Found it. The order is eligible. I'll issue the refund.
Action: issue_refund(order_id="o_5520")

The power of ReAct is adaptivity: because each action is chosen after seeing the previous result, the agent can react to whatever actually happened — an empty search, an error, an unexpected value. The cost is that it can wander, take needless steps, or loop.

Plan-and-execute: think first, then do

The opposite approach is plan-and-execute. The agent writes the entire plan up front, then runs the steps in order:

Plan:
1. Look up the user's most recent order
2. Check refund eligibility
3. Issue the refund
4. Email a confirmation

	ReAct	Plan-and-execute
When the plan is formed	Step by step	All at once, up front
Adapts to surprises	Strongly	Weakly
Predictability	Lower	Higher
Token cost	Higher (re-reasons each step)	Lower
Best for	Messy, uncertain tasks	Clear, stable tasks

Plan-and-execute is cheaper and easier to audit — you can read the plan before anything runs. But if step 2 returns something unexpected, a rigid plan has no way to cope. In practice, many agents use a hybrid: draft a plan, but allow re-planning when reality diverges from it.

Reflection: checking your own work

Both approaches get dramatically better with one addition: reflection. After taking an action, the agent evaluates the result against the goal and decides whether to accept it, retry, or change course.

Reflection is why an agent can recover from a failed API call or a bad search instead of confidently building on a broken result. It's also where a lot of the cost lives — every critique is another model call — so it's usually applied at key checkpoints, not after every tiny step.

Decomposition: big goals into sub-goals

For genuinely large tasks, agents lean on decomposition: breaking one hard goal into several smaller ones, each simple enough to solve reliably.

Launch the feature

Write copy + Update docs + Notify users

Combine results

Decomposition turns one hard task into a tree of easy ones

This is the same instinct as chain-of-thought reasoning, scaled up to whole tasks. Each sub-goal may itself be handled by a fresh agent loop. The risk is that errors compound: a small mistake in an early sub-goal can poison everything built on top of it, which is exactly why reflection and good memory matter so much.

There's no universal best plan

It's tempting to ask "which planning strategy should I use?" The honest answer is that it depends on the task's uncertainty:

Low uncertainty (steps are knowable up front): plan-and-execute is cheaper and safer.
High uncertainty (you'll learn as you go): ReAct with reflection adapts better.
Large scope: decompose, then apply one of the above to each piece.

Most production agents combine all three — decompose the goal, plan each part, and run each part with ReAct-style steps and reflection at the checkpoints.

Recap

Planning bridges the gap between a fuzzy goal and concrete, executable actions.
ReAct interleaves reasoning and acting one step at a time — highly adaptive, but can wander.
Plan-and-execute writes the full plan up front — predictable and cheap, but brittle to surprises.
Reflection (self-critique and retry) is the biggest single lever on reliability.
Decomposition splits big goals into solvable sub-goals, at the risk of compounding errors.

Planning decides what to do. But to plan across many steps, an agent has to remember what it already did and learned. That's the job of memory. Continue to AI Agent Memory.