Skip to main content

Supervised vs. Unsupervised Learning

The difference between supervised and unsupervised learning is how and what information the ML practitioner conveys to the model about the desired outputs.

Supervised learning

Supervised learning is a group of learning strategies that require data to be labeled (annotated, tagged, etc.) by a sufficiently expert human. The ML practitioner provides a model with representative examples of the kinds of inputs that the model should expect to be given and the corresponding output that is desired from the model.

For this X, we expect the model's output to be y.

For example, to learn a classification task, the model may be provided with a number of examples of cat images and a number of examples of dog images. Each cat image is a representative example of a cat, and cat is the expected/desired output.

Unsupervised learning

Unsupervised learning is a group of learning strategies that learn or recognize patterns in data without requiring data to be labeled. The ML practitioner provides a model with representative examples of the kinds of inputs the model should expect to receive but does not convey the desired output (the right answer).

Examples

  • k-means clustering: The ML practitioner provides a set of data (examples) and asks the algorithm to group the data points into k clusters. The ML practitioner does not specify which examples belong in which clusters, though the practitioner will likely evaluate the results afterward to determine suitability (often qualitative).

  • DBSCAN (clustering): The ML practitioner provides a set of data (examples) and some thresholds for computing density (usually a radius and the minimum number of examples to be within that radius), and the algorithm groups the data points into any number of clusters possibly not clustering some points (calling them noise).

Other forms of learning

While the terminology (supervised or unsupervised) conveys a sense of X or not X, there are a variety of learning strategies that don't fit neatly into either category. Further complicating things is the fact that there are many names and variations in the scientific literature for these other strategies; there is even some variation in what people consider unsupervised learning. Some common terms that you may see include semi-supervised learning (usually meaning that some, but not all, examples are labeled) and reinforcement learning (which is often used to learn to play games or for robotic control). In this section, we provide a couple of examples of learning strategies that don't fit neatly into the supervised or unsupervised categories.

Examples

  • Autoencoders: An autoencoder is sometimes considered unsupervised (since a human does not have to label any data). However, training an autoencoder does rely on labeled data; the input data is also the label. An autoencoder is a function that learns to compress (or represent, or embed) input data and reconstruct the original input from the compressed (representation or embedding of the) data.

  • Next word prediction: A common Natural Language Processing (NLP) task is next word prediction—e.g., predictive text on a smartphone. Training the model uses a text corpus. A human does not explicitly give labels: If you see this sequence, then the next word is that. But, by ingesting the first k words from a sentence/paragraph, the k+1th letter can be automatically used as a label.