Chapter 26·Intermediate·11 min read
Structured Output: Getting JSON and Clean Formats From LLMs
How do you get reliable JSON or structured output from an LLM? A practical guide to structured output prompting — defining a schema, using examples, why models drift, and the modern tools (JSON mode, function calling) that guarantee valid format.
June 29, 2026
So far we've made the model produce good answers. But if you're building software with an LLM, "good" isn't enough — you need answers in a predictable shape your code can parse: JSON, a fixed list of fields, a table. This is structured output prompting, and it's the bridge between an LLM and a real application.
Why structured output matters
A human can read free-flowing prose. Code can't. If your app sends "Extract the name, email, and order total from this message" and gets back a friendly paragraph, you can't reliably pull the fields out. You need:
{ "name": "Ada Lovelace", "email": "ada@example.com", "order_total": 42.50 }Every time. The same shape, the same keys, parseable without guesswork. Getting that consistency is the whole challenge — because, as we know from inference, the model is a probabilistic text generator, not a function that returns a fixed type.
Technique 1: define the schema explicitly
The foundation is telling the model exactly what shape you want — keys, types, and structure, with no ambiguity:
Extract the details and return JSON with exactly these keys:
name(string)order_total(number, no currency symbol) Return only the JSON object.
The more precisely you specify the schema, the less the model has to guess. Vague requests ("give me the data as JSON") produce inconsistent keys and nesting; precise schemas produce consistent ones. This is specificity applied to format.
Technique 2: show an example
Combine the schema with a few-shot example of the exact output. One concrete example pins the format more firmly than any description:
For nested or unusual structures especially, an example resolves questions the schema description leaves open (How are dates formatted? What goes in an empty field?).
Technique 3: suppress the chatter
The most common reason structured output fails to parse isn't wrong structure — it's extra text around it. Models love to wrap output in pleasantries:
"Sure! Here's the JSON you asked for:
{ ... }. Let me know if you need anything else!"
That preamble and trailing comment break a strict parser. So instruct explicitly:
"Return only valid JSON. Do not include any explanation, markdown, or text before or after the JSON."
| Without the constraint | With it |
|---|---|
"Here's your data: { ... } Hope that helps!" | { ... } |
| Wrapped in code fences | Bare, parseable JSON |
| Occasional commentary | Just the structure |
Technique 4: handle the empty and the edge cases
Tell the model what to do when data is missing, so it doesn't improvise or hallucinate:
"If a field isn't present in the input, set it to null. Never invent a value."
Without this, a model asked to extract a missing phone number may make one up to fill the field — a classic structured-output failure. Define the edge cases as part of the schema.
The reliable way: native structured-output features
Prompt-only formatting works, but it's never 100% — the model can always drift. For real reliability, modern APIs offer features that guarantee valid structure:
- JSON mode — the API constrains generation so the output is always syntactically valid JSON.
- Schema-enforced / structured outputs — you supply a schema and the response is forced to conform to it, keys and types included.
- Function (tool) calling — you define a function with typed parameters; the model returns a call to it with arguments matching your schema. This is also the foundation of how tools and agents work.
Always validate
Even with native features, treat the output defensively. Build your code to:
- Parse with error handling — never assume it'll succeed.
- Validate against your schema — check keys, types, and ranges.
- Retry or repair on failure — re-prompt, or ask the model to fix its own malformed output.
This defensive posture reflects a core lesson from LLM limitations: the model is reliable enough to build on, but never perfect, so design for the occasional miss.
Recap
- Structured output gives you parseable shapes (JSON, fields) so software can use the model's answers.
- Define the schema explicitly — keys, types, structure — to minimise the model's guessing.
- Show an example of the exact output, and suppress chatter ("return only valid JSON").
- Handle missing data explicitly so the model doesn't invent values.
- For reliability, prefer native features — JSON mode, schema enforcement, function calling — over prompt-only formatting.
- Always validate and handle failure; the model is reliable, not perfect.
When you find yourself writing the same schema-and-instructions scaffolding over and over, it's time to make it reusable. Continue to Prompt templates.