Skip to main content

Learning Tasks

Learning tasks can be broadly categorized into one of several groups. Below, we create a list of common learning tasks, focusing on those supported by Chariot, and provide descriptions and examples to help understand the distinctions.

Besides differences in the underlying model, the primary difference between each of the learning tasks is the output of the various models.

Regression

Input: almost any type of data could be input to some regression model

Output: a (meaningful) number

Examples:

Input: historical sales prices for a stock
Output: a (predicted) future price for the stock

Classification

Input: almost any type of data could be input to some classification model

Output: one choice from a list of several to many options

Examples:

Input: an image
Output: cat or dog

In this case, the model determines whether the input image contains a cat or a dog. The output cat (for example) is one choice from a list of two options, cat or dog.

Note: A classification model will only (read: always) produce an output from the pre-specified list of possible outputs. So, in the above example, if the model was provided an image of a tank, the output of the model would still be either cat or dog, and no amount of model training would ever change that. Instead, the model itself would have to change to expand its list of allowed/possible outputs.

Note: Under the hood, most classification models are actually producing a score for each possible category and then selecting the category with the highest score. So, an image of a cat might receive scores like: cat: 0.90, dog: 0.10 and then return cat (because the score was higher).

Input: a sequence of words
Output: the next word in the sequence

In this case, the model is trying to determine the most likely next word in the sentence/sequence. This is a case where there are many options from which to choose—any word in the model's dictionary.

Object Detection

Input: an image (or, possibly, a video)

Output: a (possibly empty) list of bounding boxes and labels

Bounding boxes are rectangular regions of the input image and may be specified in any number of ways. One common way to specify them is to provide minimum and maximum x and y values (or minimum and maximum row and column values... thinking of the image as a grid of pixels). Each bounding box will also be assigned a label. The task of labeling the bounding box is a classification task (see above). So, object detection is a two-part process: Determine where things are in the image, and determine what those things are in the image. The process of determining what those things are is subject to all the same caveats mentioned above when discussing classification (namely, the what always comes from a pre-specified list of options).

Examples:

Input: an image
Output: bbox: {xmin: 10, xmax: 100, ymin: 50, ymax: 150}, label: cat

Segmentation

Input: an image (or, possibly, a video)

Output: a label for every pixel of the input image

Note: A label may not be produced for every pixel of the input image, e.g., when there is missing data in the image (gaps). Additionally, it is frequently the case that models behave differently (poorly) around the borders of images. Models may account for this by omitting predictions around the edges, resulting in additional missing predictions.

Note: The per-pixel labeling comes with the same caveats as classification (above), specifically, that the label comes from a predefined set of possible/acceptable labels.