chariot.models package

Submodules

chariot.models.enum module

class chariot.models.enum.ArtifactType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsStrEnum

CHARIOT = 'chariot'

HUGGINGFACE = 'huggingface'

ONNX = 'onnx'

PYTORCH = 'pytorch'

SKLEARN = 'sklearn'

class chariot.models.enum.ArtifactTypesTaskType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsEnum

CHARIOT = ['Image Classification', 'Image Embedding', 'Image Segmentation', 'Object Detection']

HUGGINGFACE = ['Automatic Speech Recognition', 'Conversational', 'Summarization', 'Text Classification', 'Text Embedding', 'Text Fill-Mask', 'Text Generation', 'Token Classification', 'Translation']

ONNX = ['Automatic Speech Recognition', 'Conversational', 'Feature Extraction', 'Image Autoencoder', 'Image Classification', 'Image Embedding', 'Image Generation', 'Image Segmentation', 'Object Detection', 'Oriented Object Detection', 'Other - Computer Vision', 'Other - Natural Language', 'Other - Structured Data', 'Question Answer', 'Structured Data Classification', 'Structured Data Regression', 'Summarization', 'Text Classification', 'Text Embedding', 'Text Fill-Mask', 'Text Generation', 'Text2Text Generation', 'Token Classification', 'Translation']

PYTORCH = ['Automatic Speech Recognition', 'Conversational', 'Feature Extraction', 'Image Autoencoder', 'Image Classification', 'Image Embedding', 'Image Generation', 'Image Segmentation', 'Object Detection', 'Oriented Object Detection', 'Other - Computer Vision', 'Other - Natural Language', 'Other - Structured Data', 'Question Answer', 'Structured Data Classification', 'Structured Data Regression', 'Summarization', 'Text Classification', 'Text Embedding', 'Text Fill-Mask', 'Text Generation', 'Text2Text Generation', 'Token Classification', 'Translation']

SKLEARN = ['Other - Structured Data', 'Structured Data Classification', 'Structured Data Regression']

class chariot.models.enum.InferenceEngine(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsStrEnum

CHARIOTPYTORCH = 'ChariotPytorch'

HUGGINGFACE = 'Huggingface'

VLLM = 'vLLM'

class chariot.models.enum.Protocol(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsStrEnum

V2 = 'v2'

class chariot.models.enum.TaskType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsStrEnum

AUTOMATIC_SPEECH_RECOGNITION = 'Automatic Speech Recognition'

CONVERSATIONAL = 'Conversational'

FEATURE_EXTRACTION = 'Feature Extraction'

IMAGE_AUTOENCODER = 'Image Autoencoder'

IMAGE_CLASSIFICATION = 'Image Classification'

IMAGE_EMBEDDING = 'Image Embedding'

IMAGE_GENERATION = 'Image Generation'

IMAGE_SEGMENTATION = 'Image Segmentation'

OBJECT_DETECTION = 'Object Detection'

ORIENTED_OBJECT_DETECTION = 'Oriented Object Detection'

OTHER_COMPUTER_VISION = 'Other - Computer Vision'

OTHER_NATURAL_LANGUAGE = 'Other - Natural Language'

OTHER_STRUCTURED_DATA = 'Other - Structured Data'

QUESTION_ANSWER = 'Question Answer'

STRUCTURED_DATA_CLASSIFICATION = 'Structured Data Classification'

STRUCTURED_DATA_REGRESSION = 'Structured Data Regression'

SUMMARIZATION = 'Summarization'

TEXT2TEXT_GENERATION = 'Text2Text Generation'

TEXT_CLASSIFICATION = 'Text Classification'

TEXT_EMBEDDING = 'Text Embedding'

TEXT_FILL_MASK = 'Text Fill-Mask'

TEXT_GENERATION = 'Text Generation'

TOKEN_CLASSIFICATION = 'Token Classification'

TRANSLATION = 'Translation'

class chariot.models.enum.TaskTypesInferenceMethod(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsEnum

AUTOMATIC_SPEECH_RECOGNITION = ['predict']

CONVERSATIONAL = ['chat']

FEATURE_EXTRACTION = ['predict']

IMAGE_AUTOENCODER = ['embed', 'reconstruct']

IMAGE_CLASSIFICATION = ['embed', 'predict', 'predict_proba']

IMAGE_EMBEDDING = ['embed']

IMAGE_GENERATION = ['predict']

IMAGE_SEGMENTATION = ['predict', 'predict_proba']

OBJECT_DETECTION = ['detect']

ORIENTED_OBJECT_DETECTION = ['detect']

OTHER_COMPUTER_VISION = ['predict']

OTHER_NATURAL_LANGUAGE = ['predict']

OTHER_STRUCTURED_DATA = ['predict']

QUESTION_ANSWER = ['predict']

STRUCTURED_DATA_CLASSIFICATION = ['predict']

STRUCTURED_DATA_REGRESSION = ['predict']

SUMMARIZATION = ['predict']

TEXT2TEXT_GENERATION = ['predict']

TEXT_CLASSIFICATION = ['predict']

TEXT_EMBEDDING = ['embed']

TEXT_FILL_MASK = ['predict']

TEXT_GENERATION = ['complete']

TOKEN_CLASSIFICATION = ['predict']

TRANSLATION = ['predict']

class chariot.models.enum.TaskTypesRequirement(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsEnum

AUTOMATIC_SPEECH_RECOGNITION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}

CONVERSATIONAL = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}

FEATURE_EXTRACTION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}

IMAGE_AUTOENCODER = {'class_labels': False, 'input_info': False, 'input_modality': 'image'}

IMAGE_CLASSIFICATION = {'class_labels': True, 'input_info': False, 'input_modality': 'image'}

IMAGE_EMBEDDING = {'class_labels': False, 'input_info': False, 'input_modality': 'image'}

IMAGE_GENERATION = {'class_labels': False, 'input_info': False, 'input_modality': 'image'}

IMAGE_SEGMENTATION = {'class_labels': True, 'input_info': False, 'input_modality': 'image'}

OBJECT_DETECTION = {'class_labels': True, 'input_info': False, 'input_modality': 'image'}

ORIENTED_OBJECT_DETECTION = {'class_labels': True, 'input_info': False, 'input_modality': 'image'}

OTHER_COMPUTER_VISION = {'class_labels': False, 'input_info': False, 'input_modality': 'image'}

OTHER_NATURAL_LANGUAGE = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}

OTHER_STRUCTURED_DATA = {'class_labels': False, 'input_info': True, 'input_modality': 'tabular'}

QUESTION_ANSWER = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}

STRUCTURED_DATA_CLASSIFICATION = {'class_labels': True, 'input_info': True, 'input_modality': 'tabular'}

STRUCTURED_DATA_REGRESSION = {'class_labels': False, 'input_info': True, 'input_modality': 'tabular'}

SUMMARIZATION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}

TEXT2TEXT_GENERATION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}

TEXT_CLASSIFICATION = {'class_labels': True, 'input_info': False, 'input_modality': 'text'}

TEXT_EMBEDDING = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}

TEXT_FILL_MASK = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}

TEXT_GENERATION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}

TOKEN_CLASSIFICATION = {'class_labels': True, 'input_info': False, 'input_modality': 'text'}

TRANSLATION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}

chariot.models.evaluations module

chariot.models.evaluations.create_evaluation(model_id: str, snapshot_id: str, split: str, evaluation_data: Any) → CreateEvaluationResponse[source]

chariot.models.evaluations.get_evaluations(model_id: str) → GetEvaluationsResponse[source]

chariot.models.exceptions module

exception chariot.models.exceptions.ActionUnsupportedByCurrentModelError[source]: Bases: Exception

exception chariot.models.exceptions.ModelDoesNotExistError(model_name: str, version: str | None = None)[source]: Bases: Exception

exception chariot.models.exceptions.ModelUploadTimeoutError[source]: Bases: Exception

exception chariot.models.exceptions.NotAnImageError[source]: Bases: Exception

chariot.models.inference module

class chariot.models.inference.Action(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

CHAT = 'chat'

COMPLETE = 'complete'

DETECT = 'detect'

EMBED = 'embed'

PREDICT = 'predict'

PREDICT_PROBA = 'predict_proba'

RECONSTRUCT = 'reconstruct'

chariot.models.isvc_settings module

class chariot.models.isvc_settings.GPUDict[source]

Bases: TypedDict

The number and type of GPU’s allocated to a model’s inference server.

count: int

product: str

class chariot.models.isvc_settings.InferenceServerSettingsDict[source]

Bases: TypedDict

Settings for a model’s inference server.

enable_cvm_scoring: bool: Whether to enable Cramér–von Mises scoring of each request. Defaults to False. Embeddings will automatically be computed if this is enabled.

enable_data_storage: bool: Whether to enable data (e.g. image) storage when storing inferences. Defaults to False.

enable_inference_storage: bool: Whether to store inferences. Defaults to False.

enable_ks_scoring: bool: Whether to enable Kolmogorov–Smirnov scoring of each request. Defaults to False. Embeddings will automatically be computed if this is enabled.

enable_metadata_extraction: bool: Whether to enable metadata extraction when storing inferences. Defaults to False.

enable_semantic_scoring: bool: Whether to enable semantic scoring of each request. Defaults to False. Embeddings will automatically be computed if this is enabled.

huggingface_model_kwargs: dict | None: Model keyword arguments to use for the Huggingface inference engine. Defaults to None. Only used when inference_engine="Huggingface".

inference_engine: InferenceEngine | None: The inference engine to use. User selectable runtimes enable models to run under a different inference engine than the artifact type. The model must have been converted to that runtime with models export first. Passing nothing for this will result in running as the artifact type. Defaults to None.

max_batch_delay_seconds: int | float: Maximum batch delay in seconds for triggering a prediction. Must be >= 0. Defaults to 0.

max_batch_size: int: Maximum batch size for triggering a prediction. Must be > 0. Defaults to 1.

negative_sampling_rate: float: Rate at which inferences without detections will be stored. Must be >= 0 and <= 1. Defaults to 0. A value of 0.65 means that there is a 65% chance that each inference without a detection is stored.

num_workers: int: Number of workers to use for the predictor. Must be >= 1 and <= 100. Defaults to 1. For artifact_type=Pytorch this value sets minWorkers, maxWorkers, default_workers_per_model to the specified value. See https://pytorch.org/serve/configuration.html for more detail. For Chariot artifact types, sets the MLServer parallel_workers field to the specified value. See https://mlserver.readthedocs.io/en/latest/user-guide/parallel-inference.html for more details.

only_store_detections: bool: Whether to store all inferences (False) or only inferences with detections (True). Defaults to False. Ignored if enable_inference_storage is False.

positive_sampling_rate: float: Rate at which inferences with classifications or detections will be stored. Must be >= 0 and <= 1. Defaults to 0. A value of 0.65 means that there is a 65% chance that each inference with a classification or detection is stored.

predictor_cpu: str: Number of CPU cores allocated to the predictor. Must be a positive k8s quantity. Defaults to "1". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.

predictor_cpu_burstable: bool: Whether the predictor can burst to using more CPU than requested. Defaults to False.

predictor_ephemeral_storage: str | None: Amount of ephemeral (disk) storage allocated to the predictor. Must be None or a positive k8s quantity. Defaults to None. If None, no requests or limits are set. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#setting-requests-and-limits-for-local-ephemeral-storage for more detail.

predictor_gpu: GPUDict | None: Number and type of GPUs allocated to the predictor. Defaults to None.

predictor_include_embedding_model: bool: Whether to include the embedding model in the predictor. Defaults to False.

predictor_max_replicas: int: Maximum number of predictor replicas to scale up to. Must be >= 1 and <= ReplicaLimit. Defaults to 1.

predictor_memory: str: Amount of memory allocated to the predictor. Must be a positive k8s quantity. Defaults to "4Gi". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.

predictor_min_replicas: int: Minimum number of predictor replicas to scale down to. Must be >= 0 and <= ReplicaLimit. Defaults to 0.

predictor_scale_metric: Literal['concurrency', 'rps']: Which metric to use for autoscaling the predictor. Defaults to "concurrency". Valid values: - "concurrency": number of simultaneous requests to each replica. - "rps": number of requests per second.

predictor_scale_target: int: Target value for autoscaling the predictor. Must be a positive integer. Defaults to 5.

scale_down_delay_seconds: int: The amount of time to wait after the scale metric falls below the scale target before scaling down if min_replicas has not been reached. Must be >= 0 and <= 3600. Defaults to 600.

transformer_cpu: str: Number of CPU cores allocated to the transformer. Must be a positive k8s quantity. Defaults to "1". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.

transformer_cpu_burstable: bool: Whether the transformer can burst to using more CPU than requested. Defaults to False.

transformer_max_replicas: int: Maximum number of transformer replicas to scale up to. Must be >= 1 and <= ReplicaLimit. Defaults to 1.

transformer_memory: str: Amount of memory allocated to the transformer. Must be a positive k8s quantity. Defaults to "2Gi". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.

transformer_min_replicas: int: Minimum number of transformer replicas to scale down to. Must be >= 0 and <= ReplicaLimit. Defaults to 0.

transformer_scale_metric: Literal['concurrency', 'rps']: Which metric to use for autoscaling the transformer. Defaults to "concurrency". - "concurrency": number of simultaneous requests to each replica. - "rps": number of requests per second.

transformer_scale_target: int: Target value for autoscaling the transformer. Must be a positive integer. Defaults to 20.

vllm_configuration: VLLMConfigurationDict | None: The configuration for the vLLM inference engine. Defaults to None. Only used when inference_engine="vLLM".

class chariot.models.isvc_settings.IsvcSetting(key: str, value: Any, user_id: str, since: datetime.datetime, until: datetime.datetime | None = None)[source]

Bases: object

key: str

since: datetime

until: datetime | None = None

user_id: str

value: Any

class chariot.models.isvc_settings.VLLMConfigurationDict[source]

Bases: TypedDict

The configuration for the vLLM inference engine.

bitsandbytes_4bit: bool

enable_prefix_caching: bool

max_model_length: int

seed: int

chariot.models.isvc_settings.create_isvc_settings(model_id: str, settings: dict[str, Any]) → list[IsvcSetting][source]

Create settings for the isvc of this model.

NOTE: This function is deprecated and will be removed in a future release. Please use the chariot.models.model.Model.set_inference_server_settings() method instead.

chariot.models.isvc_settings.get_inference_server_settings(model_id: str) → InferenceServerSettingsDict[source]

Get the current inference server settings for model model_id.

Parameters

model_id: str: The model to get inference server settings from.

Returns

chariot.models.model.InferenceServerSettingsDict

chariot.models.isvc_settings.get_isvc_settings(model_id: str, key: str | None = None) → list[IsvcSetting][source]

Get settings for the isvc of this model.

NOTE: This function is deprecated and will be removed in a future release. Please use the chariot.models.model.Model.get_inference_server_settings() method instead.

chariot.models.isvc_settings.set_inference_server_settings(model_id: str, settings: InferenceServerSettingsDict)[source]

Set inference server settings for model model_id.

Parameters

model_id: str: The model to set inference server settings on.
settings: chariot.models.model.InferenceServerSettingsDict: Settings to apply to the inference server.

chariot.models.model module

exception chariot.models.model.ActionUnsupportedByCurrentModelError[source]: Bases: Exception

class chariot.models.model.GPUDict[source]

Bases: TypedDict

The number and type of GPU’s allocated to a model’s inference server.

count: int

product: str

class chariot.models.model.InferenceServerSettingsDict[source]

Bases: TypedDict

Settings for a model’s inference server.

enable_cvm_scoring: bool: Whether to enable Cramér–von Mises scoring of each request. Defaults to False. Embeddings will automatically be computed if this is enabled.

enable_data_storage: bool: Whether to enable data (e.g. image) storage when storing inferences. Defaults to False.

enable_inference_storage: bool: Whether to store inferences. Defaults to False.

enable_ks_scoring: bool: Whether to enable Kolmogorov–Smirnov scoring of each request. Defaults to False. Embeddings will automatically be computed if this is enabled.

enable_metadata_extraction: bool: Whether to enable metadata extraction when storing inferences. Defaults to False.

enable_semantic_scoring: bool: Whether to enable semantic scoring of each request. Defaults to False. Embeddings will automatically be computed if this is enabled.

huggingface_model_kwargs: dict | None: Model keyword arguments to use for the Huggingface inference engine. Defaults to None. Only used when inference_engine="Huggingface".

inference_engine: InferenceEngine | None: The inference engine to use. User selectable runtimes enable models to run under a different inference engine than the artifact type. The model must have been converted to that runtime with models export first. Passing nothing for this will result in running as the artifact type. Defaults to None.

max_batch_delay_seconds: int | float: Maximum batch delay in seconds for triggering a prediction. Must be >= 0. Defaults to 0.

max_batch_size: int: Maximum batch size for triggering a prediction. Must be > 0. Defaults to 1.

negative_sampling_rate: float: Rate at which inferences without detections will be stored. Must be >= 0 and <= 1. Defaults to 0. A value of 0.65 means that there is a 65% chance that each inference without a detection is stored.

num_workers: int: Number of workers to use for the predictor. Must be >= 1 and <= 100. Defaults to 1. For artifact_type=Pytorch this value sets minWorkers, maxWorkers, default_workers_per_model to the specified value. See https://pytorch.org/serve/configuration.html for more detail. For Chariot artifact types, sets the MLServer parallel_workers field to the specified value. See https://mlserver.readthedocs.io/en/latest/user-guide/parallel-inference.html for more details.

only_store_detections: bool: Whether to store all inferences (False) or only inferences with detections (True). Defaults to False. Ignored if enable_inference_storage is False.

positive_sampling_rate: float: Rate at which inferences with classifications or detections will be stored. Must be >= 0 and <= 1. Defaults to 0. A value of 0.65 means that there is a 65% chance that each inference with a classification or detection is stored.

predictor_cpu: str: Number of CPU cores allocated to the predictor. Must be a positive k8s quantity. Defaults to "1". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.

predictor_cpu_burstable: bool: Whether the predictor can burst to using more CPU than requested. Defaults to False.

predictor_ephemeral_storage: str | None: Amount of ephemeral (disk) storage allocated to the predictor. Must be None or a positive k8s quantity. Defaults to None. If None, no requests or limits are set. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#setting-requests-and-limits-for-local-ephemeral-storage for more detail.

predictor_gpu: GPUDict | None: Number and type of GPUs allocated to the predictor. Defaults to None.

predictor_include_embedding_model: bool: Whether to include the embedding model in the predictor. Defaults to False.

predictor_max_replicas: int: Maximum number of predictor replicas to scale up to. Must be >= 1 and <= ReplicaLimit. Defaults to 1.

predictor_memory: str: Amount of memory allocated to the predictor. Must be a positive k8s quantity. Defaults to "4Gi". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.

predictor_min_replicas: int: Minimum number of predictor replicas to scale down to. Must be >= 0 and <= ReplicaLimit. Defaults to 0.

predictor_scale_metric: Literal['concurrency', 'rps']: Which metric to use for autoscaling the predictor. Defaults to "concurrency". Valid values: - "concurrency": number of simultaneous requests to each replica. - "rps": number of requests per second.

predictor_scale_target: int: Target value for autoscaling the predictor. Must be a positive integer. Defaults to 5.

scale_down_delay_seconds: int: The amount of time to wait after the scale metric falls below the scale target before scaling down if min_replicas has not been reached. Must be >= 0 and <= 3600. Defaults to 600.

transformer_cpu: str: Number of CPU cores allocated to the transformer. Must be a positive k8s quantity. Defaults to "1". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.

transformer_cpu_burstable: bool: Whether the transformer can burst to using more CPU than requested. Defaults to False.

transformer_max_replicas: int: Maximum number of transformer replicas to scale up to. Must be >= 1 and <= ReplicaLimit. Defaults to 1.

transformer_memory: str: Amount of memory allocated to the transformer. Must be a positive k8s quantity. Defaults to "2Gi". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.

transformer_min_replicas: int: Minimum number of transformer replicas to scale down to. Must be >= 0 and <= ReplicaLimit. Defaults to 0.

transformer_scale_metric: Literal['concurrency', 'rps']: Which metric to use for autoscaling the transformer. Defaults to "concurrency". - "concurrency": number of simultaneous requests to each replica. - "rps": number of requests per second.

transformer_scale_target: int: Target value for autoscaling the transformer. Must be a positive integer. Defaults to 20.

vllm_configuration: VLLMConfigurationDict | None: The configuration for the vLLM inference engine. Defaults to None. Only used when inference_engine="vLLM".

class chariot.models.model.InferenceServerStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

The status of an inference server.

ERROR = 'Error'

MISCONFIGURED = 'Misconfigured'

NOTINITIALIZED = 'Not initialized'

NULL = 'null'

PENDING = 'Pending'

READY = 'Ready'

SCALED_DOWN = 'Scaled down'

STARTING = 'Starting'

UNKNOWN = 'Unknown'

UPDATING = 'Updating'

Bases: Resource

property actions: list[str]

property architecture

property class_labels

convert_inference_engine(inference_engine, force_overwrite=False) → tuple[int, str][source]

property created_at: datetime

default_action_methods = ['predict']

delete(hard_delete: bool = False)[source]: Delete this model.

download_model(tarfile: str)[source]: Download the model as a tar.gz file

export_onnx_model(tarfile: str)[source]: Download ONNX representation of the model into a tarfile

exports_supported()[source]: Return supported export modes for this model

files()[source]: Recursive listing of all files for model, returns [{ last_modified, name, size }]

fork(project_id: str, *_, name: str | None = None, summary: str | None = None, version: str | None = None) → Model[source]

Fork the model.

Parameters

project_idstr: The project to fork the model into.
namestr, optional: Optional name override.
summarystr, optional: Optional summary override.
versionstr, optional: Optional model version override.

Returns

Model: The new model fork.

get_inference_server_settings() → InferenceServerSettingsDict[source]: Get the current settings for this model’s inference server.

Returns

chariot.models.model.InferenceServerSettingsDict

Run inference action on sample.

This method posts data to the model’s inference server and returns the results. The actions property lists the available actions for this model.

The inference response id is returned when return_inference_id is true. An inference request may or may not be batched, but it must contain at least one input. As such, if inference storage is enabled, a small modification to the returned id is necessary. The lookup pattern within the inference store is id-# where # represents the index of the inference request input. For example, if an inference request with a batch of two inputs is provided, appending -0 and -1 to the id to get each inference from the inference-store will be required.

property inference_url: str | None: Url to inference server inference endpoint.

property inverse_class_labels

property isvc_settings

Get the current settings for this model’s inference server.

NOTE: This property is deprecated and will be removed in a future release. Please use the get_inference_server_settings() method instead.

property name

property name_slug

set_inference_server_settings(settings: InferenceServerSettingsDict)[source]

Set settings for this model’s inference server.

Parameters

settings: chariot.models.model.InferenceServerSettingsDict: Settings to apply to the inference server.

Create an inference server for the model object.

NOTE: All parameters to this function are deprecated and will be removed in a future release. Please use the chariot.models.model.Model.set_inference_server_settings() method to configure the inference server.

Deprecated Parameters

cpu: str: Number of cpus allocated to inference server. This sets cpu requests and limits of the kubernetes pod. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.
num_workers: int: For artifact_type=Pytorch this value sets minWorkers, maxWorkers, default_workers_per_model to the specified value. See https://pytorch.org/serve/configuration.html for more detail. For all other artifact types, sets they MLServer parallel_workers field to the specified value. See https://mlserver.readthedocs.io/en/latest/user-guide/parallel-inference.html for more details.
memory: str: Amount of memory allocated to the inference server. This sets the memory requests and limits of the kubernetes pod. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.
min_replicas: int: Minimum number of server replicas to scale down to. Defaults to 0.
max_replicas: int: Maximum number of server replicas to scale up to. Defaults to 1.
scale_metric: str: Metric to scale off of. Currently, only ‘concurrency’ and ‘rps’ are supported. Defaults to ‘concurrency’. ‘concurrency’: number of simultaneous open http connections. ‘rps’: number of http requests per second averaged over 60 seconds.
scale_target: int: Threshold value that, once exceeded, will trigger a scale up event if max_replicas has not been reached.
scale_down_delay: int | str | None: The amount of time to wait after the scale_metric falls below the scale_target before scaling down in min_replicas has not been reached. Can be in integer in seconds, a string with a number followed by ‘s’, ‘m’, or ‘h’ for seconds, minutes, or hours respectively, or None to use the default value.
gpu_count: int: number of gpus requested
gpu_type: str: Type of gpu to request. If set, gpu_count must be greater than 0
edit_existing: bool: If inference server already exists, will update it to contain the setting specified in this function call
quantization_bits: int: Passing this parameter will trigger quantization for huggingface models, only 4 or 8 is currently supported. This is passed as a model kwargs to the inference server.
huggingface_model_kwargs: dict: Any parameter passed here will be passed as a model kwargs when creating a huggingface inference server.
inference_engine: Optional[str]: The inference_engine to use, user selectable runtimes enables models to run under a different inference_engnine than the artifact type. The model must have been convert to that runtime with models export first. Passing nothing for this will result in running as the artifact type.
vllm_config: Optional[dict]: The configuration for the vLLM inference engine. Only valid when inference_engine=”vLLM”. Please consult the Chariot docs for the available options for vLLM configs.

property status

stop_inference_server()[source]

property storage_status

supported_and_existing_inference_engines()[source]

task_to_method = {'Automatic Speech Recognition': ['predict'], 'Conversational': ['chat'], 'Feature Extraction': ['predict'], 'Image Autoencoder': ['embed', 'reconstruct'], 'Image Classification': ['embed', 'predict', 'predict_proba'], 'Image Embedding': ['embed'], 'Image Generation': ['predict'], 'Image Segmentation': ['predict', 'predict_proba'], 'Object Detection': ['detect'], 'Oriented Object Detection': ['detect'], 'Other - Computer Vision': ['predict'], 'Other - Natural Language': ['predict'], 'Other - Structured Data': ['predict'], 'Question Answer': ['predict'], 'Structured Data Classification': ['predict'], 'Structured Data Regression': ['predict'], 'Summarization': ['predict'], 'Text Classification': ['predict'], 'Text Embedding': ['embed'], 'Text Fill-Mask': ['predict'], 'Text Generation': ['complete'], 'Text2Text Generation': ['predict'], 'Token Classification': ['predict'], 'Translation': ['predict']}

update_isvc_settings(settings)[source]

Updatesettings for this model’s inference server.

NOTE: This method is deprecated and will be removed in a future release. Please use the set_inference_server_settings() method instead.

property version

wait_for_inference_server(timeout: int, verbose: bool = False, wait_interval: int = 1, internal_url: bool = False) → OutputInferenceService[source]

Waits for the model’s dedicated inference server to be running and ready to accept requests. Will scale model up if it scaled to zero.

Parameters

timeout:: Number in seconds to wait for inference server to spin up.
verbose:: Whether to enable more verbose logging.
wait_interval:: How many seconds to wait after each query of the get inference server status endpoint before trying again.
internal_url:: Set to True to use inference within cluster.

Returns

The OutputInferenceService object for the inference service if it exists, None otherwise.

Raises

ApiException:: If the call to get inference server status does not return a status code that is 2XX or 404.
RuntimeError:: If the inference server failed to spin up or was not able to spin up within the timeout period.

wait_for_upload(timeout=60, wait_interval=2) → Self[source]

Wait for timeout seconds for this model to be uploaded.

Parameters

timeout:: Number in seconds to wait for model upload.
wait_interval:: How many seconds to wait after each query before trying again.

Returns

The Model object, for chaining

Raises

ModelUploadTimeoutError:: If storage_status is not “uploaded” before timeout.

exception chariot.models.model.ModelDoesNotExistError(model_name: str, version: str | None = None)[source]: Bases: Exception

class chariot.models.model.VLLMConfigurationDict[source]

Bases: TypedDict

The configuration for the vLLM inference engine.

bitsandbytes_4bit: bool

enable_prefix_caching: bool

max_model_length: int

seed: int

chariot.models.model.get_catalog(project_id: str, **kwargs) → list[OutputModelSummary][source]

get_catalog returns the model catalog matching the supplied keyword filters. See Chariot REST documentation for details.

Params

project_idstr: project_id for models query

chariot.models.model.get_model_by_id(id: str) → Model[source]: get_model_by_id returns the model matching the supplied id.

chariot.models.model.get_models(project_id: str | None = None, **kwargs) → list[Model][source]: get_models returns all models matching the supplied keyword filters. See Chariot REST documentation for details.

chariot.models.model.iter_models(**kwargs)[source]: iter_models returns an iterator over all models matching the supplied filters. See Chariot REST documentation for details.

chariot.models.stage module

class chariot.models.stage.Stage(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Possible Model Stages These values correspond to the ones defined in models-catalog https://github.com/Striveworks/chariot/blob/main/go/apps/models-catalog/pkg/constants/model_stage.go#L5-L8 # TODO consume these from json/yaml config file instead of hard coding.

PRODUCTION = 'production'

STAGING = 'staging'

chariot.models.stage.get_active_stages(model: Model)[source]: Function to get the currently active model stages and a list of archived model versions. :param model:

chariot.models.stage.get_stage_history(model: Model, offset=0, limit=10) → list[source]: Function to get a paginated history or stage changes for a given model name. :param model: :param offset: :param limit:

chariot.models.stage.set_stage(model: Model, stage: Stage)[source]

Parameters:

model
stage

chariot.models.upload module

chariot.models.upload.import_model(*, name: str, version: str, summary: str, artifact_type: str | ArtifactType, task_type: str | TaskType, project_id: str | None = None, project_name: str | None = None, subproject_name: str | None = None, organization_id: str | None = None, class_labels: dict[str, int] | None = None, model_object: Any | None = None, input_info: Any | None = None, model_path: str | None = None, use_internal_url: bool = False) → Model[source]

Import a local model into Chariot.

For a previously exported Chariot model, model_path is the local path to the gzipped tar. For a Huggingface model, model_path is either the local path to the directory or the path to a zip file containing all of the model files.

chariot.models.upload.upload_model_file(model_id: str, filename: str, data: bytes)[source]

Upload a file to the model.

NOTE: Currently only supports upload of README.md and model drift detectors.

Parameters:

model_id (str) – The model where the detector will be stored.
detector (Union[SemanticDriftDetector, BatchDriftDetector]) – The detector to upload.

Returns:

The storage url.

Return type:

str

chariot.models package

Submodules

chariot.models.enum module

chariot.models.evaluations module

chariot.models.exceptions module

chariot.models.inference module

chariot.models.isvc_settings module

Parameters

Returns

Parameters

chariot.models.model module

Parameters

Returns

Returns

Parameters

Deprecated Parameters

Parameters

Returns

Raises

Parameters

Returns

Raises

Params

chariot.models.stage module

chariot.models.upload module

Module contents