Fine-Tuning LLMs: Adapting a Model to Your Task

What is fine-tuning in AI? A clear guide to fine-tuning large language models — what it does, how it differs from prompting and RAG, when to use LoRA and parameter-efficient methods, and when fine-tuning is the wrong tool.

In the previous chapter we saw how a model is trained into a general-purpose assistant. But "general" isn't always what you want. Maybe you need a model that writes in your company's exact voice, classifies your support tickets, or speaks fluent legalese. Fine-tuning is how you specialise a model — and knowing when not to use it is just as valuable.

What fine-tuning actually is

Fine-tuning means taking an already-trained model and continuing its training on a focused set of your own examples. You're not starting from scratch — you're nudging a model that already knows language toward your specific task, domain, or style.

Pretrained model

+ your examples

Continue training

Specialised model

Fine-tuning builds on a pretrained model rather than starting over

Mechanically it's the same loop as training — predict, compare, nudge parameters — just on a smaller, targeted dataset. The result is a model whose parameters have shifted toward your use case.

The crucial distinction: prompting vs fine-tuning vs RAG

This is where people get confused and waste money. There are three ways to make a model do what you want, and they operate at different levels:

Approach	What it changes	Best for
Prompting	The input (instructions/examples)	Most tasks; fast, free to iterate
RAG	The input (retrieved knowledge)	Giving the model fresh or private facts
Fine-tuning	The model's parameters	Baking in behaviour: style, format, task patterns

If you find yourself pasting the same long instructions into every prompt, fine-tuning can bake that behaviour in so you stop repeating it. But if you need the model to know new facts, fine-tuning is the wrong tool — see below.

What fine-tuning is good at (and bad at)

Fine-tuning shines for behaviour and form, and disappoints for facts.

Good fits:

A consistent tone or voice (your brand, a persona).
A specific output format every time (always valid JSON, a fixed report structure).
A narrow, repetitive task (classify tickets into 8 categories, extract fields from invoices).
A domain style (medical notes, legal drafting) where general phrasing isn't enough.

Poor fits:

Teaching new facts. Models are bad at reliably memorising specifics from a small fine-tune, and they'll happily hallucinate around the gaps. Use RAG for knowledge.
Fast-changing information. Re-training every time facts change is absurd; retrieve them instead.

LoRA: why fine-tuning got affordable

Fine-tuning every one of a model's billions of parameters is expensive and storage-heavy. Parameter-efficient fine-tuning fixes that — and LoRA (Low-Rank Adaptation) is the most popular flavour.

The idea: freeze the original model and train only a small set of add-on weights (adapters) that adjust its behaviour. You're tuning a tiny fraction of the parameters instead of all of them.

Full fine-tune

100%

LoRA

~1%

LoRA trains a tiny fraction of the parameters (illustrative)

The payoff is large:

Far cheaper and faster to train.
Tiny to store — adapters are megabytes, not gigabytes.
Swappable — keep multiple adapters for different tasks and load the one you need.

LoRA is why fine-tuning, once the preserve of big labs, is now within reach of small teams.

Data quality is everything

A fine-tune is only as good as its examples. The most common failure isn't the technique — it's the data.

Consistency beats volume. A few hundred clean, consistent examples usually beat thousands of noisy ones. The model will faithfully learn whatever patterns are in your data, including the mistakes.
Cover the real distribution. Include the edge cases and formats you actually expect at runtime.
Mind contradictions. If two examples answer the same input differently, you're teaching the model to be inconsistent.

This echoes a theme from the Machine Learning guide: data quality decides everything.

A decision checklist

Before fine-tuning, ask:

Have I exhausted prompting? Better instructions and few-shot examples solve more than people expect.
Is this a knowledge problem? If so, use RAG, not fine-tuning.
Is the behaviour stable and repetitive? Fine-tuning rewards consistency; it's wasted on one-offs.
Do I have clean, representative data? If not, fix that first — it's the ceiling on results.

If you answer "prompting's not enough, it's about behaviour not facts, the task is stable, and my data is clean," then fine-tuning is the right call.

Recap

Fine-tuning continues training a pretrained model on your examples to specialise it.
It changes the model's parameters — unlike prompting and RAG, which change the input.
It's best for behaviour (tone, format, task patterns) and poor for facts — use RAG for knowledge.
LoRA and other parameter-efficient methods make it cheap by tuning a small set of add-on weights.
Data quality is the ceiling — clean, consistent, representative examples matter more than volume.
Try prompting → RAG → fine-tuning, in that order.

We've trained and specialised the model. Now, what actually happens in the moment you hit "send" and watch words stream back? That's inference. Continue to LLM inference: how a model generates text.