chariot.models package

Submodules

chariot.models.enum module

class chariot.models.enum.ArtifactType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsStrEnum

CHARIOT = 'chariot'
HUGGINGFACE = 'huggingface'
NEURALMAGIC = 'neuralmagic'
ONNX = 'onnx'
PYTORCH = 'pytorch'
SKLEARN = 'sklearn'
class chariot.models.enum.ArtifactTypesTaskType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsEnum

CHARIOT = ['Image Classification', 'Image Embedding', 'Image Segmentation', 'Object Detection']
HUGGINGFACE = ['Automatic Speech Recognition', 'Conversational', 'Summarization', 'Text Classification', 'Text Embedding', 'Text Fill-Mask', 'Text Generation', 'Token Classification', 'Translation']
NEURALMAGIC = ['Image Classification']
ONNX = ['Automatic Speech Recognition', 'Conversational', 'Feature Extraction', 'Image Autoencoder', 'Image Classification', 'Image Embedding', 'Image Generation', 'Image Segmentation', 'Object Detection', 'Oriented Object Detection', 'Other - Computer Vision', 'Other - Natural Language', 'Other - Structured Data', 'Question Answer', 'Structured Data Classification', 'Structured Data Regression', 'Summarization', 'Text Classification', 'Text Embedding', 'Text Fill-Mask', 'Text Generation', 'Text2Text Generation', 'Token Classification', 'Translation']
PYTORCH = ['Automatic Speech Recognition', 'Conversational', 'Feature Extraction', 'Image Autoencoder', 'Image Classification', 'Image Embedding', 'Image Generation', 'Image Segmentation', 'Object Detection', 'Oriented Object Detection', 'Other - Computer Vision', 'Other - Natural Language', 'Other - Structured Data', 'Question Answer', 'Structured Data Classification', 'Structured Data Regression', 'Summarization', 'Text Classification', 'Text Embedding', 'Text Fill-Mask', 'Text Generation', 'Text2Text Generation', 'Token Classification', 'Translation']
SKLEARN = ['Other - Structured Data', 'Structured Data Classification', 'Structured Data Regression']
class chariot.models.enum.InferenceEngine(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsStrEnum

CHARIOTDEEPSPARSE = 'ChariotDeepSparse'
CHARIOTPYTORCH = 'ChariotPytorch'
HUGGINGFACE = 'Huggingface'
VLLM = 'vLLM'
class chariot.models.enum.Protocol(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsStrEnum

V2 = 'v2'
class chariot.models.enum.TaskType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsStrEnum

AUTOMATIC_SPEECH_RECOGNITION = 'Automatic Speech Recognition'
CONVERSATIONAL = 'Conversational'
FEATURE_EXTRACTION = 'Feature Extraction'
IMAGE_AUTOENCODER = 'Image Autoencoder'
IMAGE_CLASSIFICATION = 'Image Classification'
IMAGE_EMBEDDING = 'Image Embedding'
IMAGE_GENERATION = 'Image Generation'
IMAGE_SEGMENTATION = 'Image Segmentation'
OBJECT_DETECTION = 'Object Detection'
ORIENTED_OBJECT_DETECTION = 'Oriented Object Detection'
OTHER_COMPUTER_VISION = 'Other - Computer Vision'
OTHER_NATURAL_LANGUAGE = 'Other - Natural Language'
OTHER_STRUCTURED_DATA = 'Other - Structured Data'
QUESTION_ANSWER = 'Question Answer'
STRUCTURED_DATA_CLASSIFICATION = 'Structured Data Classification'
STRUCTURED_DATA_REGRESSION = 'Structured Data Regression'
SUMMARIZATION = 'Summarization'
TEXT2TEXT_GENERATION = 'Text2Text Generation'
TEXT_CLASSIFICATION = 'Text Classification'
TEXT_EMBEDDING = 'Text Embedding'
TEXT_FILL_MASK = 'Text Fill-Mask'
TEXT_GENERATION = 'Text Generation'
TOKEN_CLASSIFICATION = 'Token Classification'
TRANSLATION = 'Translation'
class chariot.models.enum.TaskTypesInferenceMethod(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsEnum

AUTOMATIC_SPEECH_RECOGNITION = ['predict']
CONVERSATIONAL = ['chat']
FEATURE_EXTRACTION = ['predict']
IMAGE_AUTOENCODER = ['embed', 'reconstruct']
IMAGE_CLASSIFICATION = ['embed', 'predict', 'predict_proba']
IMAGE_EMBEDDING = ['embed']
IMAGE_GENERATION = ['predict']
IMAGE_SEGMENTATION = ['predict', 'predict_proba']
OBJECT_DETECTION = ['detect']
ORIENTED_OBJECT_DETECTION = ['detect']
OTHER_COMPUTER_VISION = ['predict']
OTHER_NATURAL_LANGUAGE = ['predict']
OTHER_STRUCTURED_DATA = ['predict']
QUESTION_ANSWER = ['predict']
STRUCTURED_DATA_CLASSIFICATION = ['predict']
STRUCTURED_DATA_REGRESSION = ['predict']
SUMMARIZATION = ['predict']
TEXT2TEXT_GENERATION = ['predict']
TEXT_CLASSIFICATION = ['predict']
TEXT_EMBEDDING = ['embed']
TEXT_FILL_MASK = ['predict']
TEXT_GENERATION = ['complete']
TOKEN_CLASSIFICATION = ['predict']
TRANSLATION = ['predict']
class chariot.models.enum.TaskTypesRequirement(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: ModelsEnum

AUTOMATIC_SPEECH_RECOGNITION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
CONVERSATIONAL = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
FEATURE_EXTRACTION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
IMAGE_AUTOENCODER = {'class_labels': False, 'input_info': False, 'input_modality': 'image'}
IMAGE_CLASSIFICATION = {'class_labels': True, 'input_info': False, 'input_modality': 'image'}
IMAGE_EMBEDDING = {'class_labels': False, 'input_info': False, 'input_modality': 'image'}
IMAGE_GENERATION = {'class_labels': False, 'input_info': False, 'input_modality': 'image'}
IMAGE_SEGMENTATION = {'class_labels': True, 'input_info': False, 'input_modality': 'image'}
OBJECT_DETECTION = {'class_labels': True, 'input_info': False, 'input_modality': 'image'}
ORIENTED_OBJECT_DETECTION = {'class_labels': True, 'input_info': False, 'input_modality': 'image'}
OTHER_COMPUTER_VISION = {'class_labels': False, 'input_info': False, 'input_modality': 'image'}
OTHER_NATURAL_LANGUAGE = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
OTHER_STRUCTURED_DATA = {'class_labels': False, 'input_info': True, 'input_modality': 'tabular'}
QUESTION_ANSWER = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
STRUCTURED_DATA_CLASSIFICATION = {'class_labels': True, 'input_info': True, 'input_modality': 'tabular'}
STRUCTURED_DATA_REGRESSION = {'class_labels': False, 'input_info': True, 'input_modality': 'tabular'}
SUMMARIZATION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
TEXT2TEXT_GENERATION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
TEXT_CLASSIFICATION = {'class_labels': True, 'input_info': False, 'input_modality': 'text'}
TEXT_EMBEDDING = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
TEXT_FILL_MASK = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
TEXT_GENERATION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
TOKEN_CLASSIFICATION = {'class_labels': True, 'input_info': False, 'input_modality': 'text'}
TRANSLATION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}

chariot.models.evaluations module

chariot.models.evaluations.create_evaluation(model_id: str, snapshot_id: str, split: str, evaluation_data: Any) CreateEvaluationResponse[source]
chariot.models.evaluations.get_evaluations(model_id: str) GetEvaluationsResponse[source]

chariot.models.inference module

class chariot.models.inference.Action(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

CHAT = 'chat'
COMPLETE = 'complete'
DETECT = 'detect'
EMBED = 'embed'
PREDICT = 'predict'
PREDICT_PROBA = 'predict_proba'
RECONSTRUCT = 'reconstruct'

chariot.models.isvc_settings module

class chariot.models.isvc_settings.GPUDict[source]

Bases: TypedDict

The number and type of GPU’s allocated to a model’s inference server.

count: int
product: str
class chariot.models.isvc_settings.InferenceServerSettingsDict[source]

Bases: TypedDict

Settings for a model’s inference server.

enable_cvm_scoring: bool

Whether to enable Cramér–von Mises scoring of each request. Defaults to False. Embeddings will automatically be computed if this is enabled.

enable_data_storage: bool

Whether to enable data (e.g. image) storage when storing inferences. Defaults to False.

enable_inference_storage: bool

Whether to store inferences. Defaults to False.

enable_ks_scoring: bool

Whether to enable Kolmogorov–Smirnov scoring of each request. Defaults to False. Embeddings will automatically be computed if this is enabled.

enable_metadata_extraction: bool

Whether to enable metadata extraction when storing inferences. Defaults to False.

enable_semantic_scoring: bool

Whether to enable semantic scoring of each request. Defaults to False. Embeddings will automatically be computed if this is enabled.

huggingface_model_kwargs: dict | None

Model keyword arguments to use for the Huggingface inference engine. Defaults to None. Only used when inference_engine="Huggingface".

inference_engine: InferenceEngine | None

The inference engine to use. User selectable runtimes enable models to run under a different inference engine than the artifact type. The model must have been converted to that runtime with models export first. Passing nothing for this will result in running as the artifact type. Defaults to None.

max_batch_delay_seconds: int | float

Maximum batch delay in seconds for triggering a prediction. Must be >= 0. Defaults to 0.

max_batch_size: int

Maximum batch size for triggering a prediction. Must be > 0. Defaults to 1.

negative_sampling_rate: float

Rate at which inferences without detections will be stored. Must be >= 0 and <= 1. Defaults to 0. A value of 0.65 means that there is a 65% chance that each inference without a detection is stored.

num_workers: int

Number of workers to use for the predictor. Must be >= 1 and <= 100. Defaults to 1. For artifact_type=Pytorch this value sets minWorkers, maxWorkers, default_workers_per_model to the specified value. See https://pytorch.org/serve/configuration.html for more detail. For Chariot artifact types, sets the MLServer parallel_workers field to the specified value. See https://mlserver.readthedocs.io/en/latest/user-guide/parallel-inference.html for more details.

only_store_detections: bool

Whether to store all inferences (False) or only inferences with detections (True). Defaults to False. Ignored if enable_inference_storage is False.

positive_sampling_rate: float

Rate at which inferences with classifications or detections will be stored. Must be >= 0 and <= 1. Defaults to 0. A value of 0.65 means that there is a 65% chance that each inference with a classification or detection is stored.

predictor_cpu: str

Number of CPU cores allocated to the predictor. Must be a positive k8s quantity. Defaults to "1". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.

predictor_cpu_burstable: bool

Whether the predictor can burst to using more CPU than requested. Defaults to False.

predictor_ephemeral_storage: str | None

Amount of ephemeral (disk) storage allocated to the predictor. Must be None or a positive k8s quantity. Defaults to None. If None, no requests or limits are set. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#setting-requests-and-limits-for-local-ephemeral-storage for more detail.

predictor_gpu: GPUDict | None

Number and type of GPUs allocated to the predictor. Defaults to None.

predictor_include_embedding_model: bool

Whether to include the embedding model in the predictor. Defaults to False.

predictor_max_replicas: int

Maximum number of predictor replicas to scale up to. Must be >= 1 and <= ReplicaLimit. Defaults to 1.

predictor_memory: str

Amount of memory allocated to the predictor. Must be a positive k8s quantity. Defaults to "4Gi". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.

predictor_min_replicas: int

Minimum number of predictor replicas to scale down to. Must be >= 0 and <= ReplicaLimit. Defaults to 0.

predictor_scale_metric: Literal['concurrency', 'rps']

Which metric to use for autoscaling the predictor. Defaults to "concurrency". Valid values: - "concurrency": number of simultaneous requests to each replica. - "rps": number of requests per second.

predictor_scale_target: int

Target value for autoscaling the predictor. Must be a positive integer. Defaults to 5.

scale_down_delay_seconds: int

The amount of time to wait after the scale metric falls below the scale target before scaling down if min_replicas has not been reached. Must be >= 0 and <= 3600. Defaults to 600.

transformer_cpu: str

Number of CPU cores allocated to the transformer. Must be a positive k8s quantity. Defaults to "1". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.

transformer_cpu_burstable: bool

Whether the transformer can burst to using more CPU than requested. Defaults to False.

transformer_max_replicas: int

Maximum number of transformer replicas to scale up to. Must be >= 1 and <= ReplicaLimit. Defaults to 1.

transformer_memory: str

Amount of memory allocated to the transformer. Must be a positive k8s quantity. Defaults to "2Gi". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.

transformer_min_replicas: int

Minimum number of transformer replicas to scale down to. Must be >= 0 and <= ReplicaLimit. Defaults to 0.

transformer_scale_metric: Literal['concurrency', 'rps']

Which metric to use for autoscaling the transformer. Defaults to "concurrency". - "concurrency": number of simultaneous requests to each replica. - "rps": number of requests per second.

transformer_scale_target: int

Target value for autoscaling the transformer. Must be a positive integer. Defaults to 20.

vllm_configuration: VLLMConfigurationDict | None

The configuration for the vLLM inference engine. Defaults to None. Only used when inference_engine="vLLM".

class chariot.models.isvc_settings.IsvcSetting(key: str, value: Any, user_id: str, since: datetime.datetime, until: datetime.datetime | None = None)[source]

Bases: object

key: str
since: datetime
until: datetime | None = None
user_id: str
value: Any
class chariot.models.isvc_settings.VLLMConfigurationDict[source]

Bases: TypedDict

The configuration for the vLLM inference engine.

bitsandbytes_4bit: bool
enable_prefix_caching: bool
max_model_length: int
seed: int
chariot.models.isvc_settings.create_isvc_settings(model_id: str, settings: dict[str, Any]) list[IsvcSetting][source]

Create settings for the isvc of this model.

NOTE: This function is deprecated and will be removed in a future release. Please use the chariot.models.model.Model.set_inference_server_settings() method instead.

chariot.models.isvc_settings.get_inference_server_settings(model_id: str) InferenceServerSettingsDict[source]

Get the current inference server settings for model model_id.

Parameters

model_id: str

The model to get inference server settings from.

Returns

chariot.models.model.InferenceServerSettingsDict

chariot.models.isvc_settings.get_isvc_settings(model_id: str, key: str | None = None) list[IsvcSetting][source]

Get settings for the isvc of this model.

NOTE: This function is deprecated and will be removed in a future release. Please use the chariot.models.model.Model.get_inference_server_settings() method instead.

chariot.models.isvc_settings.set_inference_server_settings(model_id: str, settings: InferenceServerSettingsDict)[source]

Set inference server settings for model model_id.

Parameters

model_id: str

The model to set inference server settings on.

settings: chariot.models.model.InferenceServerSettingsDict

Settings to apply to the inference server.

chariot.models.model module

exception chariot.models.model.ActionUnsupportedByCurrentModelError[source]

Bases: Exception

class chariot.models.model.GPUDict[source]

Bases: TypedDict

The number and type of GPU’s allocated to a model’s inference server.

count: int
product: str
class chariot.models.model.InferenceServerSettingsDict[source]

Bases: TypedDict

Settings for a model’s inference server.

enable_cvm_scoring: bool

Whether to enable Cramér–von Mises scoring of each request. Defaults to False. Embeddings will automatically be computed if this is enabled.

enable_data_storage: bool

Whether to enable data (e.g. image) storage when storing inferences. Defaults to False.

enable_inference_storage: bool

Whether to store inferences. Defaults to False.

enable_ks_scoring: bool

Whether to enable Kolmogorov–Smirnov scoring of each request. Defaults to False. Embeddings will automatically be computed if this is enabled.

enable_metadata_extraction: bool

Whether to enable metadata extraction when storing inferences. Defaults to False.

enable_semantic_scoring: bool

Whether to enable semantic scoring of each request. Defaults to False. Embeddings will automatically be computed if this is enabled.

huggingface_model_kwargs: dict | None

Model keyword arguments to use for the Huggingface inference engine. Defaults to None. Only used when inference_engine="Huggingface".

inference_engine: InferenceEngine | None

The inference engine to use. User selectable runtimes enable models to run under a different inference engine than the artifact type. The model must have been converted to that runtime with models export first. Passing nothing for this will result in running as the artifact type. Defaults to None.

max_batch_delay_seconds: int | float

Maximum batch delay in seconds for triggering a prediction. Must be >= 0. Defaults to 0.

max_batch_size: int

Maximum batch size for triggering a prediction. Must be > 0. Defaults to 1.

negative_sampling_rate: float

Rate at which inferences without detections will be stored. Must be >= 0 and <= 1. Defaults to 0. A value of 0.65 means that there is a 65% chance that each inference without a detection is stored.

num_workers: int

Number of workers to use for the predictor. Must be >= 1 and <= 100. Defaults to 1. For artifact_type=Pytorch this value sets minWorkers, maxWorkers, default_workers_per_model to the specified value. See https://pytorch.org/serve/configuration.html for more detail. For Chariot artifact types, sets the MLServer parallel_workers field to the specified value. See https://mlserver.readthedocs.io/en/latest/user-guide/parallel-inference.html for more details.

only_store_detections: bool

Whether to store all inferences (False) or only inferences with detections (True). Defaults to False. Ignored if enable_inference_storage is False.

positive_sampling_rate: float

Rate at which inferences with classifications or detections will be stored. Must be >= 0 and <= 1. Defaults to 0. A value of 0.65 means that there is a 65% chance that each inference with a classification or detection is stored.

predictor_cpu: str

Number of CPU cores allocated to the predictor. Must be a positive k8s quantity. Defaults to "1". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.

predictor_cpu_burstable: bool

Whether the predictor can burst to using more CPU than requested. Defaults to False.

predictor_ephemeral_storage: str | None

Amount of ephemeral (disk) storage allocated to the predictor. Must be None or a positive k8s quantity. Defaults to None. If None, no requests or limits are set. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#setting-requests-and-limits-for-local-ephemeral-storage for more detail.

predictor_gpu: GPUDict | None

Number and type of GPUs allocated to the predictor. Defaults to None.

predictor_include_embedding_model: bool

Whether to include the embedding model in the predictor. Defaults to False.

predictor_max_replicas: int

Maximum number of predictor replicas to scale up to. Must be >= 1 and <= ReplicaLimit. Defaults to 1.

predictor_memory: str

Amount of memory allocated to the predictor. Must be a positive k8s quantity. Defaults to "4Gi". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.

predictor_min_replicas: int

Minimum number of predictor replicas to scale down to. Must be >= 0 and <= ReplicaLimit. Defaults to 0.

predictor_scale_metric: Literal['concurrency', 'rps']

Which metric to use for autoscaling the predictor. Defaults to "concurrency". Valid values: - "concurrency": number of simultaneous requests to each replica. - "rps": number of requests per second.

predictor_scale_target: int

Target value for autoscaling the predictor. Must be a positive integer. Defaults to 5.

scale_down_delay_seconds: int

The amount of time to wait after the scale metric falls below the scale target before scaling down if min_replicas has not been reached. Must be >= 0 and <= 3600. Defaults to 600.

transformer_cpu: str

Number of CPU cores allocated to the transformer. Must be a positive k8s quantity. Defaults to "1". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.

transformer_cpu_burstable: bool

Whether the transformer can burst to using more CPU than requested. Defaults to False.

transformer_max_replicas: int

Maximum number of transformer replicas to scale up to. Must be >= 1 and <= ReplicaLimit. Defaults to 1.

transformer_memory: str

Amount of memory allocated to the transformer. Must be a positive k8s quantity. Defaults to "2Gi". See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.

transformer_min_replicas: int

Minimum number of transformer replicas to scale down to. Must be >= 0 and <= ReplicaLimit. Defaults to 0.

transformer_scale_metric: Literal['concurrency', 'rps']

Which metric to use for autoscaling the transformer. Defaults to "concurrency". - "concurrency": number of simultaneous requests to each replica. - "rps": number of requests per second.

transformer_scale_target: int

Target value for autoscaling the transformer. Must be a positive integer. Defaults to 20.

vllm_configuration: VLLMConfigurationDict | None

The configuration for the vLLM inference engine. Defaults to None. Only used when inference_engine="vLLM".

class chariot.models.model.InferenceServerStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

The status of an inference server.

ERROR = 'Error'
MISCONFIGURED = 'Misconfigured'
NOTINITIALIZED = 'Not initialized'
NULL = 'null'
PENDING = 'Pending'
READY = 'Ready'
SCALED_DOWN = 'Scaled down'
STARTING = 'Starting'
UNKNOWN = 'Unknown'
UPDATING = 'Updating'
class chariot.models.model.Model(project_id: str | None = None, id: str | None = None, project_name: str | None = None, subproject_name: str | None = None, organization_id: str | None = None, name: str | None = None, version: str | None = None, metadata: Any = None, start_server: bool = True)[source]

Bases: Resource

property actions: list[str]
property architecture
property class_labels
convert_inference_engine(inference_engine, force_overwrite=False) tuple[int, str][source]
property created_at: datetime
default_action_methods = ['predict']
delete(hard_delete: bool = False)[source]

Delete this model.

download_model(tarfile: str)[source]

Download the model as a tar.gz file

export_onnx_model(tarfile: str)[source]

Download ONNX representation of the model into a tarfile

exports_supported()[source]

Return supported export modes for this model

files()[source]

Recursive listing of all files for model, returns [{ last_modified, name, size }]

fork(project_id: str, *_, name: str | None = None, summary: str | None = None, version: str | None = None) Model[source]

Fork the model.

Parameters

project_idstr

The project to fork the model into.

namestr, optional

Optional name override.

summarystr, optional

Optional summary override.

versionstr, optional

Optional model version override.

Returns

Model

The new model fork.

get_inference_server_settings() InferenceServerSettingsDict[source]

Get the current settings for this model’s inference server.

Returns

chariot.models.model.InferenceServerSettingsDict

infer(action: Action, sample: Any, timeout: int = 60, verbose: bool = False, url: str | None = None, custom_metadata: Mapping[str, str | float | int | Mapping[str, Any]] | Sequence[Mapping[str, str | float | int | Mapping[str, Any]]] | None = None, return_inference_id: bool = False, return_semantic_score: bool = False, score_threshold: float | None = None, **inference_kwargs) Any[source]

Run inference action on sample.

This method posts data to the model’s inference server and returns the results. The actions property lists the available actions for this model.

The inference response id is returned when return_inference_id is true. An inference request may or may not be batched, but it must contain at least one input. As such, if inference storage is enabled, a small modification to the returned id is necessary. The lookup pattern within the inference store is id-# where # represents the index of the inference request input. For example, if an inference request with a batch of two inputs is provided, appending -0 and -1 to the id to get each inference from the inference-store will be required.

property inference_url: str | None

Url to inference server inference endpoint.

property inverse_class_labels
property isvc_settings

Get the current settings for this model’s inference server.

NOTE: This property is deprecated and will be removed in a future release. Please use the get_inference_server_settings() method instead.

property name
property name_slug
set_inference_server_settings(settings: InferenceServerSettingsDict)[source]

Set settings for this model’s inference server.

Parameters

settings: chariot.models.model.InferenceServerSettingsDict

Settings to apply to the inference server.

start_inference_server(cpu: str | None = None, num_workers: int | None = None, memory: str | None = None, min_replicas: int | None = None, max_replicas: int | None = None, scale_metric: str | None = None, scale_target: int | None = None, scale_down_delay: int | str | None = None, gpu_count: int | None = None, gpu_type: str | None = None, edit_existing: bool | None = None, quantization_bits: int | None = None, huggingface_model_kwargs: dict | None = None, inference_engine: InferenceEngine | str | None = None, vllm_config: dict | None = None)[source]

Create an inference server for the model object.

NOTE: All parameters to this function are deprecated and will be removed in a future release. Please use the chariot.models.model.Model.set_inference_server_settings() method to configure the inference server.

Deprecated Parameters

cpu: str

Number of cpus allocated to inference server. This sets cpu requests and limits of the kubernetes pod. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.

num_workers: int

For artifact_type=Pytorch this value sets minWorkers, maxWorkers, default_workers_per_model to the specified value. See https://pytorch.org/serve/configuration.html for more detail. For all other artifact types, sets they MLServer parallel_workers field to the specified value. See https://mlserver.readthedocs.io/en/latest/user-guide/parallel-inference.html for more details.

memory: str

Amount of memory allocated to the inference server. This sets the memory requests and limits of the kubernetes pod. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.

min_replicas: int

Minimum number of server replicas to scale down to. Defaults to 0.

max_replicas: int

Maximum number of server replicas to scale up to. Defaults to 1.

scale_metric: str

Metric to scale off of. Currently, only ‘concurrency’ and ‘rps’ are supported. Defaults to ‘concurrency’. ‘concurrency’: number of simultaneous open http connections. ‘rps’: number of http requests per second averaged over 60 seconds.

scale_target: int

Threshold value that, once exceeded, will trigger a scale up event if max_replicas has not been reached.

scale_down_delay: int | str | None

The amount of time to wait after the scale_metric falls below the scale_target before scaling down in min_replicas has not been reached. Can be in integer in seconds, a string with a number followed by ‘s’, ‘m’, or ‘h’ for seconds, minutes, or hours respectively, or None to use the default value.

gpu_count: int

number of gpus requested

gpu_type: str

Type of gpu to request. If set, gpu_count must be greater than 0

edit_existing: bool

If inference server already exists, will update it to contain the setting specified in this function call

quantization_bits: int

Passing this parameter will trigger quantization for huggingface models, only 4 or 8 is currently supported. This is passed as a model kwargs to the inference server.

huggingface_model_kwargs: dict

Any parameter passed here will be passed as a model kwargs when creating a huggingface inference server.

inference_engine: Optional[str]

The inference_engine to use, user selectable runtimes enables models to run under a different inference_engnine than the artifact type. The model must have been convert to that runtime with models export first. Passing nothing for this will result in running as the artifact type.

vllm_config: Optional[dict]

The configuration for the vLLM inference engine. Only valid when inference_engine=”vLLM”. Please consult the Chariot docs for the available options for vLLM configs.

property status
stop_inference_server()[source]
property storage_status
supported_and_existing_inference_engines()[source]
task_to_method = {'Automatic Speech Recognition': ['predict'], 'Conversational': ['chat'], 'Feature Extraction': ['predict'], 'Image Autoencoder': ['embed', 'reconstruct'], 'Image Classification': ['embed', 'predict', 'predict_proba'], 'Image Embedding': ['embed'], 'Image Generation': ['predict'], 'Image Segmentation': ['predict', 'predict_proba'], 'Object Detection': ['detect'], 'Oriented Object Detection': ['detect'], 'Other - Computer Vision': ['predict'], 'Other - Natural Language': ['predict'], 'Other - Structured Data': ['predict'], 'Question Answer': ['predict'], 'Structured Data Classification': ['predict'], 'Structured Data Regression': ['predict'], 'Summarization': ['predict'], 'Text Classification': ['predict'], 'Text Embedding': ['embed'], 'Text Fill-Mask': ['predict'], 'Text Generation': ['complete'], 'Text2Text Generation': ['predict'], 'Token Classification': ['predict'], 'Translation': ['predict']}
update_isvc_settings(settings)[source]

Updatesettings for this model’s inference server.

NOTE: This method is deprecated and will be removed in a future release. Please use the set_inference_server_settings() method instead.

property version
wait_for_inference_server(timeout: int, verbose: bool = False, wait_interval: int = 1, internal_url: bool = False) OutputInferenceService[source]

Waits for the model’s dedicated inference server to be running and ready to accept requests. Will scale model up if it scaled to zero.

Parameters

timeout:

Number in seconds to wait for inference server to spin up.

verbose:

Whether to enable more verbose logging.

wait_interval:

How many seconds to wait after each query of the get inference server status endpoint before trying again.

internal_url:

Set to True to use inference within cluster.

Returns

The OutputInferenceService object for the inference service if it exists, None otherwise.

Raises

ApiException:

If the call to get inference server status does not return a status code that is 2XX or 404.

RuntimeError:

If the inference server failed to spin up or was not able to spin up within the timeout period.

wait_for_upload(timeout=60, wait_interval=2) Self[source]

Wait for timeout seconds for this model to be uploaded.

Parameters

timeout:

Number in seconds to wait for model upload.

wait_interval:

How many seconds to wait after each query before trying again.

Returns

The Model object, for chaining

Raises

ModelUploadTimeoutError:

If storage_status is not “uploaded” before timeout.

exception chariot.models.model.ModelDoesNotExistError(model_name: str, version: str | None = None)[source]

Bases: Exception

class chariot.models.model.VLLMConfigurationDict[source]

Bases: TypedDict

The configuration for the vLLM inference engine.

bitsandbytes_4bit: bool
enable_prefix_caching: bool
max_model_length: int
seed: int
chariot.models.model.get_catalog(project_id: str, **kwargs) list[OutputModelSummary][source]

get_catalog returns the model catalog matching the supplied keyword filters. See Chariot REST documentation for details.

Params

project_idstr

project_id for models query

chariot.models.model.get_model_by_id(id: str) Model[source]

get_model_by_id returns the model matching the supplied id.

chariot.models.model.get_models(project_id: str | None = None, **kwargs) list[Model][source]

get_models returns all models matching the supplied keyword filters. See Chariot REST documentation for details.

chariot.models.model.iter_models(**kwargs)[source]

iter_models returns an iterator over all models matching the supplied filters. See Chariot REST documentation for details.

chariot.models.stage module

class chariot.models.stage.Stage(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Possible Model Stages These values correspond to the ones defined in models-catalog https://github.com/Striveworks/chariot/blob/main/go/apps/models-catalog/pkg/constants/model_stage.go#L5-L8 # TODO consume these from json/yaml config file instead of hard coding.

PRODUCTION = 'production'
STAGING = 'staging'
chariot.models.stage.get_active_stages(model: Model)[source]

Function to get the currently active model stages and a list of archived model versions. :param model:

chariot.models.stage.get_stage_history(model: Model, offset=0, limit=10) list[source]

Function to get a paginated history or stage changes for a given model name. :param model: :param offset: :param limit:

chariot.models.stage.set_stage(model: Model, stage: Stage)[source]
Parameters:
  • model

  • stage

chariot.models.upload module

chariot.models.upload.import_model(*, name: str, version: str, summary: str, artifact_type: str | ArtifactType, task_type: str | TaskType, project_id: str | None = None, project_name: str | None = None, subproject_name: str | None = None, organization_id: str | None = None, class_labels: dict | None = None, model_object: Any | None = None, model_path: str | None = None, use_internal_url: bool = False, **kwargs) Model[source]

Import a local model into Chariot. For a previously exported Chariot model, model_path is the local path to the gzipped tar

Module contents