Expedition 11·Beginner·13 min read
The Three Flavors of Machine Learning: Supervised, Unsupervised, Reinforcement
Almost all machine learning comes in three styles, defined by what kind of examples the model learns from. Here's supervised, unsupervised, and reinforcement learning explained in plain English — what each one is for, with everyday examples and no math.
June 17, 2026
In the last chapter we landed one big idea: machine learning means showing a computer examples and letting it discover the rules. But "examples" can mean very different things — and that difference splits machine learning into three broad families.
The split is set by what your data looks like: do your examples come with the right answers attached, with no answers at all, or as feedback you only get after you act? Each situation calls for a different style of learning. Knowing which is which is the single best map for navigating the whole field.
| Flavor | What your data has | What the model does | Everyday example |
|---|---|---|---|
| Supervised | Inputs with correct answers | Learns to map input → answer | Spam filter, price predictor |
| Unsupervised | Inputs with no answers | Finds hidden structure | Customer grouping, anomaly detection |
| Reinforcement | Rewards after actions | Learns which actions pay off | Game-playing AI, robotics |
Let's walk through each.
1. Supervised learning: learning from labelled examples
This is the workhorse — the kind behind most machine learning you encounter day to day.
Supervised means every example comes with the correct answer attached, called a label. Emails labelled "spam" or "not spam." Houses listed with their actual sale price. X-rays marked "tumour" or "clear." The model studies these input–answer pairs and learns to reproduce the mapping, so that when a new, unlabelled input arrives, it can predict the answer.
The name comes from the idea of a teacher supervising: every example is a flashcard with the answer on the back, and the model is graded on whether it got it right.
Supervised problems come in two shapes, depending on whether the answer is a category or a number:
- Classification — the answer is a category from a fixed set. Spam or not spam. Cat, dog, or bird. Approve or decline. The model is sorting inputs into buckets.
- Regression — the answer is a number on a continuous scale. A house price. Tomorrow's temperature. Expected sales next quarter. The model is predicting a quantity.
| Classification | Regression | |
|---|---|---|
| Answer type | A category | A number |
| Example question | "Is this spam?" | "What will this house sell for?" |
| Example answer | "Yes" / "No" | "£420,000" |
| Output | A label | A value on a scale |
The setup is identical — labelled examples in, mapping out. Only the shape of the answer differs. If you can frame your problem as "here are inputs, here are the answers I want, learn the connection," it's supervised learning.
2. Unsupervised learning: finding structure with no answers
Sometimes you have piles of data but no labels at all — no one has marked the right answers, and maybe there isn't a single "right answer" to begin with. You just suspect there's structure hiding in there.
Unsupervised learning hunts for that structure on its own. You don't tell it what to look for; you ask it to organise the data and surface patterns you didn't already know.
The two most common jobs:
- Clustering — grouping similar items together. Hand a model your customers' behaviour and it might surface three natural groups — say, bargain-hunters, loyal regulars, and one-time buyers — without anyone defining those groups in advance. The model found them.
- Anomaly detection — spotting the things that don't fit any pattern. A transaction unlike a card's normal behaviour gets flagged as possible fraud. A sensor reading unlike all the others hints at a failing machine.
The trade-off: because there's no answer key, you can't simply "grade" an unsupervised model the way you grade a supervised one. Its findings need human interpretation — the model can tell you these things cluster together, but it's up to you to decide whether that grouping is meaningful or useful.
3. Reinforcement learning: learning from consequences
The third flavor is the most different, and the most like how animals and people learn. There are no labelled examples and no static dataset. Instead, an agent acts in an environment, and the only feedback it gets is a reward or a penalty after the fact.
Think of training a dog. You can't explain "sit" in words. You wait for behaviour you like and give a treat. Over many repetitions, the dog learns which actions earn rewards. Reinforcement learning is that, formalised: the agent tries actions, sees what reward follows, and gradually shifts toward the actions that pay off most over time.
This is how AI learned to beat world champions at Go and chess, how game bots master arcade titles, and how robots learn to walk or grip objects. The defining trait is delayed reward: a move in chess might only prove good twenty moves later, so the agent has to learn which early actions lead to eventual payoff. That makes reinforcement learning powerful but notoriously tricky and data-hungry — it often needs millions of trial runs.
They aren't rigid boxes
It's tempting to file every system under exactly one flavor, but real systems blend them. A self-driving car uses supervised learning to recognise pedestrians, unsupervised learning to spot unusual road scenes, and reinforcement-style ideas to refine driving decisions.
The large language models from our generative AI guide are a perfect example: they're trained in a massive self-supervised phase (predicting the next word in text — labels that come free from the text itself), then polished with reinforcement learning from human feedback. Three flavors, one system.
So treat these categories as a map, not a cage. Their real value is diagnostic: when you face a new problem, ask "what does my data look like?" and the flavor — and the right tools — fall out immediately.
Recap
- Machine learning splits into three flavors based on what your data looks like.
- Supervised — labelled examples; learn a mapping. Splits into classification (predict a category) and regression (predict a number). The most common kind.
- Unsupervised — no labels; find structure like clusters or anomalies. Needs human interpretation.
- Reinforcement — learn by trial, error, and reward; great for games, robotics, and decision-making.
- Real systems mix all three.
Whatever the flavor, there's a question we've dodged so far: once a model has learned something, how do you know it learned the right thing — and not just memorised its examples? That's where every serious ML project lives or dies. Next: Training, testing, and why models overfit.