Chain of Thought Prompting: Making LLMs Reason Step by Step

What is chain-of-thought prompting? A clear guide to getting LLMs to reason step by step — why it improves accuracy on hard problems, zero-shot vs few-shot CoT, when to use it, and its real costs and limits.

In few-shot prompting we shaped what the model produces by example. Now we shape how it thinks. Chain-of-thought (CoT) prompting — asking the model to reason step by step before answering — is one of the most reliable ways to boost accuracy on hard problems, and it's astonishingly simple to use.

The problem: one-leap answers

Recall from inference that an LLM generates one token at a time, and each token it produces becomes part of the input for the next. Now consider asking a hard question:

"A shop had 23 apples. It sold 17, then a delivery of 30 arrived, and 12 were rotten and thrown out. How many are left?"

If you demand the answer immediately, the model has to compress the entire multi-step calculation into its very first token. That's like being forced to blurt the answer to a word problem before working anything out. It often gets it wrong — not for lack of ability, but for lack of room to compute.

The fix: think out loud

Chain-of-thought prompting asks the model to work through the steps before committing to an answer:

"...How many are left? Let's work through it step by step."

Now the model generates: "Start with 23. Sold 17, leaving 6. A delivery of 30 arrives: 6 + 30 = 36. Remove 12 rotten: 36 − 12 = 24. Answer: 24." Each step is conditioned on the previous one, so the final answer is built on a scaffold rather than guessed.

Pose problem

Generate reasoning steps

Each step conditions the next

Sound final answer

Chain-of-thought builds the answer on intermediate steps

Zero-shot CoT: the magic phrase

The simplest version needs no examples at all. Just add a line like:

"Let's think step by step."

This is zero-shot chain-of-thought, and it's remarkable how much it helps. That single instruction flips the model from "answer now" to "reason first," and accuracy on math, logic, and multi-step questions jumps. We previewed it back in the zero-shot chapter — here's why it works: it literally changes what tokens get generated first.

Direct answer

+ "step by step"

Few-shot CoT

Illustrative accuracy lift on multi-step reasoning tasks

Few-shot CoT: show the reasoning

For harder or domain-specific problems, combine CoT with few-shot examples — but make your examples show the reasoning, not just the answer:

Q: Roger has 5 balls. He buys 2 cans of 3 balls each. How many now? A: He starts with 5. Two cans of 3 is 6. 5 + 6 = 11. Answer: 11.

Q: [your real problem] A:

By demonstrating the style of reasoning, you teach the model how to think about your kind of problem, not just that it should think. This is few-shot CoT, and it's the heavier-duty version.

When to use chain of thought

CoT pays off when the answer depends on several dependent steps:

Great fit	Poor fit
Math and arithmetic	Simple factual lookups
Logic puzzles	One-word classifications
Multi-step planning	Trivial rewrites
Analysis with several factors	Tasks where speed matters more than depth
Debugging reasoning	Cases where you only want the answer, fast

For a simple lookup like "What's the capital of Japan?", CoT just adds cost. Match the technique to the task.

The costs and limits — be honest

CoT is powerful, not free:

It costs tokens and time. All that reasoning is generated output — more tokens, slower responses, higher bills.
Plausible reasoning can still be wrong. The model can produce confident, well-structured steps that contain an error and still land on a wrong answer. The reasoning looks like a proof but isn't guaranteed to be one.
It can rationalise. Sometimes the model picks an answer and then writes reasoning to justify it, rather than reasoning its way there.

A practical note on modern models

Newer "reasoning" models do chain-of-thought internally by default — they're trained to think before answering, so you may not need to ask explicitly. But the principle still matters: for any model, giving room to reason produces better answers than demanding an instant one. When in doubt, invite the steps.

If you want the reasoning but not the clutter, ask the model to reason internally and then give only the final answer in a clean format — which leads neatly into the next chapter.

Recap

Chain-of-thought prompting asks the model to reason step by step before answering.
It works because the model thinks by generating tokens — steps give it room to compute instead of guessing in one leap.
Zero-shot CoT ("Let's think step by step") boosts accuracy with no examples; few-shot CoT shows worked reasoning for harder tasks.
Use it for multi-step problems (math, logic, planning, analysis); skip it for simple lookups.
It costs tokens and time, and plausible reasoning can still be wrong — verify what matters.

CoT often produces a great answer wrapped in a lot of reasoning text. When you need to use that answer in software, you need it in a clean, predictable shape. Continue to Structured output prompting.