Chariot User Guide
The Chariot user documentation is designed to meet you where you are in your data science journey. We have organized guides for the major components of building and deploying machine learning models in Chariot below, including both practical links, which help get you to use the platform quickly, and theory-based links to broaden your understanding of applicable data science fundamentals.
Introduction
Overview of Chariot
Chariot is a vertically integrated MLOps platform, designed to provide the tools that will help you turn production data into effective, labeled datasets that can be used to train and scale deployable models in production.
Table of Contents
The following is a list of the sections in this user guide.
- Administration
- Manage Datasets
- Annotation
- Training
- Training Blueprints
- Catalog Models
- Evaluate Models
- Deploy and Monitor Models
- Inference Store
- Workspaces
Resources
If you are new to data science and would like a quick overview of the entire process of creating a machine learning model, refer to our Data Science Fundamentals guide.
Chariot can be accessed either by the platform user interface (UI) or via the Chariot Python SDK. The SDK provides high-level programmatic access to Chariot resources that will be especially valuable for high-touch technical users.
In addition to these methods, Chariot microservices are exposed via REST endpoints, which enable interaction with the Chariot platform via direct application programming interface (API) calls.
Quick Reference Guide
How Do I Navigate the Chariot UI?
Chariot resources are grouped into projects. Projects are owned by organizations. Each resource—a training run, a model, a dataset, or an annotation task—belongs to exactly one project. This structure enables the organization of resources into logical groupings and allows for better access control, and users can be given permission to access specific projects.
How Do I Get My Data Into Chariot?
Data can be uploaded into Chariot either by importing a .tar/.zip directory of your data using the Chariot UI or by using the Chariot SDK. The Chariot datasets documentation provides the steps for getting your data into Chariot. Chariot supports a variety of image formats and text formats.
If your data has already been annotated, you should include those annotations when uploading your data and will need to convert them into a Chariot-supported annotation format.
Machine learning models are only as good as the data they are trained on, so it is critical to be careful when curating your data. Our Data Science Fundamentals guide contains several resources to help you curate the best possible datasets. Specifically, we address the following common questions:
-
How much data do I need?
-
What are data splits? Why do we need them? How do we make them?
-
How do we mitigate imbalances in a dataset?
How Do I Annotate Data in Chariot?
Annotation is a process in which human experts provide details about data that they will use to train a model. Essentially, this is analogous to teachers creating answer sheets for their students to use while studying for an exam. Our annotation documentation provides a full walk-through of the annotation tools that Chariot provides.
For more information about annotations in general, navigate to the following articles from our Data Science Fundamentals guide.
-
What are annotations?
-
Best practices annotation best practices?
How Do I Use a Model From the Chariot Catalog?
There are several options for how to use a model. Models can be used in the Chariot platform with the UI (with a try-it-out feature), in the Python SDK, or by using the REST APIs. When a model is used "in-platform," computing resources inside the Chariot cluster are used to generate model outputs, which are called inferences.
Alternatively, models can be exported and run on any external resources you have available.
How Do I Train a Model on Data in Chariot?
Using the Chariot UI, training a model is as easy as completing a wizard. This approach makes the most common training options available to users without writing any code. For those comfortable writing code, additional training options are possible using the SDK.
There are many things to consider when training a model, including:
-
The specific learning task
-
The process of training
-
Evaluating models
-
An entire vocabulary to be familiar with
-
Response to failure in training