Code Safari

Expedition 11·Beginner·13 min read

The Three Flavors of Machine Learning: Supervised, Unsupervised, Reinforcement

Almost all machine learning comes in three styles, defined by what kind of examples the model learns from. Here's supervised, unsupervised, and reinforcement learning explained in plain English — what each one is for, with everyday examples and no math.

June 17, 2026

In the last chapter we landed one big idea: machine learning means showing a computer examples and letting it discover the rules. But "examples" can mean very different things — and that difference splits machine learning into three broad families.

The split is set by what your data looks like: do your examples come with the right answers attached, with no answers at all, or as feedback you only get after you act? Each situation calls for a different style of learning. Knowing which is which is the single best map for navigating the whole field.

FlavorWhat your data hasWhat the model doesEveryday example
SupervisedInputs with correct answersLearns to map input → answerSpam filter, price predictor
UnsupervisedInputs with no answersFinds hidden structureCustomer grouping, anomaly detection
ReinforcementRewards after actionsLearns which actions pay offGame-playing AI, robotics

Let's walk through each.

1. Supervised learning: learning from labelled examples

This is the workhorse — the kind behind most machine learning you encounter day to day.

Supervised means every example comes with the correct answer attached, called a label. Emails labelled "spam" or "not spam." Houses listed with their actual sale price. X-rays marked "tumour" or "clear." The model studies these input–answer pairs and learns to reproduce the mapping, so that when a new, unlabelled input arrives, it can predict the answer.

The name comes from the idea of a teacher supervising: every example is a flashcard with the answer on the back, and the model is graded on whether it got it right.

Labelled examples (input + answer)
Model learns the mapping
New input → predicted answer
Supervised learning: labelled examples teach the model a mapping it can apply to new inputs

Supervised problems come in two shapes, depending on whether the answer is a category or a number:

  • Classification — the answer is a category from a fixed set. Spam or not spam. Cat, dog, or bird. Approve or decline. The model is sorting inputs into buckets.
  • Regression — the answer is a number on a continuous scale. A house price. Tomorrow's temperature. Expected sales next quarter. The model is predicting a quantity.
ClassificationRegression
Answer typeA categoryA number
Example question"Is this spam?""What will this house sell for?"
Example answer"Yes" / "No""£420,000"
OutputA labelA value on a scale

The setup is identical — labelled examples in, mapping out. Only the shape of the answer differs. If you can frame your problem as "here are inputs, here are the answers I want, learn the connection," it's supervised learning.

2. Unsupervised learning: finding structure with no answers

Sometimes you have piles of data but no labels at all — no one has marked the right answers, and maybe there isn't a single "right answer" to begin with. You just suspect there's structure hiding in there.

Unsupervised learning hunts for that structure on its own. You don't tell it what to look for; you ask it to organise the data and surface patterns you didn't already know.

The two most common jobs:

  • Clustering — grouping similar items together. Hand a model your customers' behaviour and it might surface three natural groups — say, bargain-hunters, loyal regulars, and one-time buyers — without anyone defining those groups in advance. The model found them.
  • Anomaly detection — spotting the things that don't fit any pattern. A transaction unlike a card's normal behaviour gets flagged as possible fraud. A sensor reading unlike all the others hints at a failing machine.
Unlabelled data
Model finds patterns / groups
Structure you didn’t define emerges
Unsupervised learning: no answers given — the model surfaces structure on its own

The trade-off: because there's no answer key, you can't simply "grade" an unsupervised model the way you grade a supervised one. Its findings need human interpretation — the model can tell you these things cluster together, but it's up to you to decide whether that grouping is meaningful or useful.

3. Reinforcement learning: learning from consequences

The third flavor is the most different, and the most like how animals and people learn. There are no labelled examples and no static dataset. Instead, an agent acts in an environment, and the only feedback it gets is a reward or a penalty after the fact.

Think of training a dog. You can't explain "sit" in words. You wait for behaviour you like and give a treat. Over many repetitions, the dog learns which actions earn rewards. Reinforcement learning is that, formalised: the agent tries actions, sees what reward follows, and gradually shifts toward the actions that pay off most over time.

Agent takes an action
Environment returns reward/penalty
Agent updates its strategy
Repeat thousands of times
The reinforcement loop: act, observe the reward, adjust, repeat

This is how AI learned to beat world champions at Go and chess, how game bots master arcade titles, and how robots learn to walk or grip objects. The defining trait is delayed reward: a move in chess might only prove good twenty moves later, so the agent has to learn which early actions lead to eventual payoff. That makes reinforcement learning powerful but notoriously tricky and data-hungry — it often needs millions of trial runs.

They aren't rigid boxes

It's tempting to file every system under exactly one flavor, but real systems blend them. A self-driving car uses supervised learning to recognise pedestrians, unsupervised learning to spot unusual road scenes, and reinforcement-style ideas to refine driving decisions.

The large language models from our generative AI guide are a perfect example: they're trained in a massive self-supervised phase (predicting the next word in text — labels that come free from the text itself), then polished with reinforcement learning from human feedback. Three flavors, one system.

So treat these categories as a map, not a cage. Their real value is diagnostic: when you face a new problem, ask "what does my data look like?" and the flavor — and the right tools — fall out immediately.

Recap

  • Machine learning splits into three flavors based on what your data looks like.
  • Supervised — labelled examples; learn a mapping. Splits into classification (predict a category) and regression (predict a number). The most common kind.
  • Unsupervised — no labels; find structure like clusters or anomalies. Needs human interpretation.
  • Reinforcement — learn by trial, error, and reward; great for games, robotics, and decision-making.
  • Real systems mix all three.

Whatever the flavor, there's a question we've dodged so far: once a model has learned something, how do you know it learned the right thing — and not just memorised its examples? That's where every serious ML project lives or dies. Next: Training, testing, and why models overfit.

The Three Flavors of Machine Learning: Supervised, Unsupervised, Reinforcement | Code Safari