Chapter 18·Intermediate·11 min read
Fine-Tuning LLMs: Adapting a Model to Your Task
What is fine-tuning in AI? A clear guide to fine-tuning large language models — what it does, how it differs from prompting and RAG, when to use LoRA and parameter-efficient methods, and when fine-tuning is the wrong tool.
June 29, 2026
In the previous chapter we saw how a model is trained into a general-purpose assistant. But "general" isn't always what you want. Maybe you need a model that writes in your company's exact voice, classifies your support tickets, or speaks fluent legalese. Fine-tuning is how you specialise a model — and knowing when not to use it is just as valuable.
What fine-tuning actually is
Fine-tuning means taking an already-trained model and continuing its training on a focused set of your own examples. You're not starting from scratch — you're nudging a model that already knows language toward your specific task, domain, or style.
Mechanically it's the same loop as training — predict, compare, nudge parameters — just on a smaller, targeted dataset. The result is a model whose parameters have shifted toward your use case.
The crucial distinction: prompting vs fine-tuning vs RAG
This is where people get confused and waste money. There are three ways to make a model do what you want, and they operate at different levels:
| Approach | What it changes | Best for |
|---|---|---|
| Prompting | The input (instructions/examples) | Most tasks; fast, free to iterate |
| RAG | The input (retrieved knowledge) | Giving the model fresh or private facts |
| Fine-tuning | The model's parameters | Baking in behaviour: style, format, task patterns |
If you find yourself pasting the same long instructions into every prompt, fine-tuning can bake that behaviour in so you stop repeating it. But if you need the model to know new facts, fine-tuning is the wrong tool — see below.
What fine-tuning is good at (and bad at)
Fine-tuning shines for behaviour and form, and disappoints for facts.
Good fits:
- A consistent tone or voice (your brand, a persona).
- A specific output format every time (always valid JSON, a fixed report structure).
- A narrow, repetitive task (classify tickets into 8 categories, extract fields from invoices).
- A domain style (medical notes, legal drafting) where general phrasing isn't enough.
Poor fits:
- Teaching new facts. Models are bad at reliably memorising specifics from a small fine-tune, and they'll happily hallucinate around the gaps. Use RAG for knowledge.
- Fast-changing information. Re-training every time facts change is absurd; retrieve them instead.
LoRA: why fine-tuning got affordable
Fine-tuning every one of a model's billions of parameters is expensive and storage-heavy. Parameter-efficient fine-tuning fixes that — and LoRA (Low-Rank Adaptation) is the most popular flavour.
The idea: freeze the original model and train only a small set of add-on weights (adapters) that adjust its behaviour. You're tuning a tiny fraction of the parameters instead of all of them.
The payoff is large:
- Far cheaper and faster to train.
- Tiny to store — adapters are megabytes, not gigabytes.
- Swappable — keep multiple adapters for different tasks and load the one you need.
LoRA is why fine-tuning, once the preserve of big labs, is now within reach of small teams.
Data quality is everything
A fine-tune is only as good as its examples. The most common failure isn't the technique — it's the data.
- Consistency beats volume. A few hundred clean, consistent examples usually beat thousands of noisy ones. The model will faithfully learn whatever patterns are in your data, including the mistakes.
- Cover the real distribution. Include the edge cases and formats you actually expect at runtime.
- Mind contradictions. If two examples answer the same input differently, you're teaching the model to be inconsistent.
This echoes a theme from the Machine Learning guide: data quality decides everything.
A decision checklist
Before fine-tuning, ask:
- Have I exhausted prompting? Better instructions and few-shot examples solve more than people expect.
- Is this a knowledge problem? If so, use RAG, not fine-tuning.
- Is the behaviour stable and repetitive? Fine-tuning rewards consistency; it's wasted on one-offs.
- Do I have clean, representative data? If not, fix that first — it's the ceiling on results.
If you answer "prompting's not enough, it's about behaviour not facts, the task is stable, and my data is clean," then fine-tuning is the right call.
Recap
- Fine-tuning continues training a pretrained model on your examples to specialise it.
- It changes the model's parameters — unlike prompting and RAG, which change the input.
- It's best for behaviour (tone, format, task patterns) and poor for facts — use RAG for knowledge.
- LoRA and other parameter-efficient methods make it cheap by tuning a small set of add-on weights.
- Data quality is the ceiling — clean, consistent, representative examples matter more than volume.
- Try prompting → RAG → fine-tuning, in that order.
We've trained and specialised the model. Now, what actually happens in the moment you hit "send" and watch words stream back? That's inference. Continue to LLM inference: how a model generates text.