What Is an LLM? Large Language Models Explained Simply

What is a large language model, really? A plain-English explanation of what an LLM is, what it does, how it differs from older AI, and why 'predict the next token' is the whole trick behind ChatGPT, Claude, and Gemini.

You've used a large language model — ChatGPT, Claude, Gemini, Copilot. You've seen it write emails, fix code, explain quantum physics, and argue both sides of a debate. So it's reasonable to assume something complex is going on under the hood.

It is and it isn't. The machinery is genuinely vast, but the core idea is almost embarrassingly simple. This guide unpacks how a large language model actually works, chapter by chapter, no math required. We start at the foundation: what an LLM is.

The simplest accurate definition

A large language model is a system that, given some text, predicts what text is likely to come next — and does this so well that useful behaviour falls out of it.

That's the whole engine. You give it "The capital of France is" and it predicts "Paris." You give it "Write a haiku about autumn" and it predicts, word by word, a plausible haiku. It is, at heart, a spectacularly good autocomplete.

If that sounds like an oversimplification, it isn't. Everything else in this guide — tokens, embeddings, Transformers, attention — exists to make that one prediction as good as possible.

Breaking down the name

The term large language model is descriptive. Each word earns its place.

Word	What it means
Large	Billions of internal parameters, trained on trillions of words. Scale is the point.
Language	The medium is text (and increasingly images and audio), modelled as sequences of tokens.
Model	A mathematical function that maps an input to an output — here, text-so-far to a prediction of what's next.

The large part is not marketing. A small version of this exact architecture is a curiosity; scaled up by a few orders of magnitude, it becomes capable of things nobody explicitly designed. We trace that scaling story in the history of LLMs.

How it differs from older AI

For decades, AI meant building a separate, narrow system for each task: one for spam detection, one for translation, one for chess. Each was hand-crafted, trained on task-specific data, and useless outside its lane.

LLMs broke that pattern. You train one general model on a huge sweep of human text, and it turns out to be able to do thousands of tasks it was never specifically built for.

Huge general text

Train one LLM

Translate · summarise · code · explain

Old AI: one model per task. LLMs: one model, many tasks.

This is the shift from narrow AI to general-purpose models. It's why a single chatbot can help you debug code in the morning and draft a wedding toast at night, with no retraining in between.

What's actually inside

An LLM is, concretely, two things:

An architecture — a specific wiring of mathematical operations called a Transformer (we devote a whole chapter to it). Think of it as the shape of the brain.
Parameters — billions of numbers (also called weights) that were tuned during training. Think of them as everything the model "learned," stored as dials.

When you send a prompt, your text flows through the architecture, the parameters shape it at every step, and out the other end comes a prediction for the next token. That's inference, and we cover it in detail later. For a visual tour of the insides, see what's inside an AI model.

Why it feels intelligent

If it's "just" prediction, why does it feel like it understands you? Because predicting the next word well turns out to require modelling an enormous amount about the world.

To reliably continue "The detective realised the killer had to be the —", the model has to track who's in the story, what they did, and what makes narrative sense. To continue a half-written function correctly, it has to track variables, types, and intent. Good prediction forces the model to absorb grammar, facts, reasoning patterns, and style.

So the intelligence is real in its effects, even though the mechanism is prediction. The catch — and it's a big one — is that the model optimises for plausible, not true.

It is great at...	It struggles with...
Fluent, well-structured writing	Reliable factual accuracy
Pattern-matching and transformation	Knowing what it doesn't know
Following the style of your prompt	Precise counting and arithmetic
Plausible reasoning	Anything outside its training data

That gap between fluent and correct is the root of AI hallucination, and we return to it throughout this guide.

You steer it with the prompt

Here's the practical consequence of "it continues your text": what you write completely shapes what you get back. The prompt is the steering wheel.

The same model will give a terse answer or a thorough one, a formal tone or a casual one, a wrong answer or a right one — depending entirely on how you frame the input. That leverage is the entire reason prompt engineering is a skill worth learning.

Recap

An LLM is a next-token predictor: given text, it predicts what comes next, one token at a time.
Large is literal — billions of parameters trained on trillions of words; scale is what makes it work.
It's general-purpose: one model, trained once, does thousands of tasks — unlike older narrow AI.
Inside, it's an architecture (a Transformer) plus billions of tuned parameters; it generates answers, it doesn't look them up.
It feels intelligent because good prediction requires modelling the world — but it optimises for plausible, not true.
The prompt steers everything, which is why prompt engineering matters.

Next, we zoom into the unit the model actually reads. It doesn't see words or letters — it sees tokens. Continue to Tokens: how an LLM reads and counts text.