Skip to main content

Scikit-learn Models

Scikit-learn models are traditional machine learning models ideal for structured data tasks like classification and regression. These models are typically smaller, faster to train, and require less computational resources than deep learning models.

File Format Requirements

Scikit-learn models can be provided as:

  • Python object: Direct model_object parameter (for trained models in memory)
  • Joblib file: .joblib serialized model file
  • Archive: .tar.gz or directory containing a .joblib file

Import Example

Importing a scikit-learn classification model trained on the classic Iris dataset may look like the following:

from chariot.models import import_model, ArtifactType, TaskType

# Define mapping of class labels to integer output.
class_labels = {"Setosa": 0, "Versicolour": 1, "Virginica": 2}

# Provide user-friendly names and descriptions of the input features.
input_info = [
{
"name": "Sepal length (cm)",
"description": "Length of the iris' sepals. Sepals are the leaf-like structure surrounding the petals.",
},
{
"name": "Sepal width (cm)",
"description": "Width of the iris' sepals. Sepals are the leaf-like structures surrounding the petals.",
},
{
"name": "Petal length (cm)",
"description": "Length of the iris' petals.",
},
{
"name": "Petal width (cm)",
"description": "Width of the iris' petals.",
},
]

# This will create a new model entry in the catalog, at the project and name specified.
model = import_model(
name="<NAME OF MODEL>",
# One of `project_id` or `project_name` is required.
project_id="<PROJECT ID>",
project_name="<PROJECT NAME>",
version="<MODEL VERSION>",
artifact_type=ArtifactType.SKLEARN,
task_type=TaskType.STRUCTURED_DATA_CLASSIFICATION,
class_labels=class_labels,
summary="testing scikit-learn model import",
input_info=input_info,
model_object=sklearn_model,
)

where model_object is a fit model or pipeline, e.g., an object of type RandomForestClassifier, LogisticRegression, Pipeline, etc.

You can alternatively store your scikit-learn model as a model.joblib file and upload it with the model_path=path_to_file argument. A directory or .tar.gz containing the model.joblib is also accepted.