chariot.models package
Submodules
chariot.models.enum module
- class chariot.models.enum.ArtifactType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
ModelsStrEnum
- CHARIOT = 'chariot'
- HUGGINGFACE = 'huggingface'
- NEURALMAGIC = 'neuralmagic'
- ONNX = 'onnx'
- PYTORCH = 'pytorch'
- SKLEARN = 'sklearn'
- class chariot.models.enum.ArtifactTypesTaskType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
ModelsEnum
- CHARIOT = ['Image Classification', 'Image Embedding', 'Image Segmentation', 'Object Detection']
- HUGGINGFACE = ['Automatic Speech Recognition', 'Conversational', 'Summarization', 'Text Classification', 'Text Embedding', 'Text Fill-Mask', 'Text Generation', 'Token Classification', 'Translation']
- NEURALMAGIC = ['Image Classification']
- ONNX = ['Automatic Speech Recognition', 'Conversational', 'Feature Extraction', 'Image Autoencoder', 'Image Classification', 'Image Embedding', 'Image Generation', 'Image Segmentation', 'Object Detection', 'Oriented Object Detection', 'Other - Computer Vision', 'Other - Natural Language', 'Other - Structured Data', 'Question Answer', 'Structured Data Classification', 'Structured Data Regression', 'Summarization', 'Text Classification', 'Text Embedding', 'Text Fill-Mask', 'Text Generation', 'Text2Text Generation', 'Token Classification', 'Translation']
- PYTORCH = ['Automatic Speech Recognition', 'Conversational', 'Feature Extraction', 'Image Autoencoder', 'Image Classification', 'Image Embedding', 'Image Generation', 'Image Segmentation', 'Object Detection', 'Oriented Object Detection', 'Other - Computer Vision', 'Other - Natural Language', 'Other - Structured Data', 'Question Answer', 'Structured Data Classification', 'Structured Data Regression', 'Summarization', 'Text Classification', 'Text Embedding', 'Text Fill-Mask', 'Text Generation', 'Text2Text Generation', 'Token Classification', 'Translation']
- SKLEARN = ['Other - Structured Data', 'Structured Data Classification', 'Structured Data Regression']
- class chariot.models.enum.InferenceEngine(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
ModelsStrEnum
- CHARIOTDEEPSPARSE = 'ChariotDeepSparse'
- CHARIOTPYTORCH = 'ChariotPytorch'
- HUGGINGFACE = 'Huggingface'
- VLLM = 'vLLM'
- class chariot.models.enum.Protocol(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
ModelsStrEnum
- V2 = 'v2'
- class chariot.models.enum.TaskType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
ModelsStrEnum
- AUTOMATIC_SPEECH_RECOGNITION = 'Automatic Speech Recognition'
- CONVERSATIONAL = 'Conversational'
- FEATURE_EXTRACTION = 'Feature Extraction'
- IMAGE_AUTOENCODER = 'Image Autoencoder'
- IMAGE_CLASSIFICATION = 'Image Classification'
- IMAGE_EMBEDDING = 'Image Embedding'
- IMAGE_GENERATION = 'Image Generation'
- IMAGE_SEGMENTATION = 'Image Segmentation'
- OBJECT_DETECTION = 'Object Detection'
- ORIENTED_OBJECT_DETECTION = 'Oriented Object Detection'
- OTHER_COMPUTER_VISION = 'Other - Computer Vision'
- OTHER_NATURAL_LANGUAGE = 'Other - Natural Language'
- OTHER_STRUCTURED_DATA = 'Other - Structured Data'
- QUESTION_ANSWER = 'Question Answer'
- STRUCTURED_DATA_CLASSIFICATION = 'Structured Data Classification'
- STRUCTURED_DATA_REGRESSION = 'Structured Data Regression'
- SUMMARIZATION = 'Summarization'
- TEXT2TEXT_GENERATION = 'Text2Text Generation'
- TEXT_CLASSIFICATION = 'Text Classification'
- TEXT_EMBEDDING = 'Text Embedding'
- TEXT_FILL_MASK = 'Text Fill-Mask'
- TEXT_GENERATION = 'Text Generation'
- TOKEN_CLASSIFICATION = 'Token Classification'
- TRANSLATION = 'Translation'
- class chariot.models.enum.TaskTypesInferenceMethod(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
ModelsEnum
- AUTOMATIC_SPEECH_RECOGNITION = ['predict']
- CONVERSATIONAL = ['chat']
- FEATURE_EXTRACTION = ['predict']
- IMAGE_AUTOENCODER = ['embed', 'reconstruct']
- IMAGE_CLASSIFICATION = ['embed', 'predict', 'predict_proba']
- IMAGE_EMBEDDING = ['embed']
- IMAGE_GENERATION = ['predict']
- IMAGE_SEGMENTATION = ['predict', 'predict_proba']
- OBJECT_DETECTION = ['detect']
- ORIENTED_OBJECT_DETECTION = ['detect']
- OTHER_COMPUTER_VISION = ['predict']
- OTHER_NATURAL_LANGUAGE = ['predict']
- OTHER_STRUCTURED_DATA = ['predict']
- QUESTION_ANSWER = ['predict']
- STRUCTURED_DATA_CLASSIFICATION = ['predict']
- STRUCTURED_DATA_REGRESSION = ['predict']
- SUMMARIZATION = ['predict']
- TEXT2TEXT_GENERATION = ['predict']
- TEXT_CLASSIFICATION = ['predict']
- TEXT_EMBEDDING = ['embed']
- TEXT_FILL_MASK = ['predict']
- TEXT_GENERATION = ['complete']
- TOKEN_CLASSIFICATION = ['predict']
- TRANSLATION = ['predict']
- class chariot.models.enum.TaskTypesRequirement(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
ModelsEnum
- AUTOMATIC_SPEECH_RECOGNITION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
- CONVERSATIONAL = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
- FEATURE_EXTRACTION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
- IMAGE_AUTOENCODER = {'class_labels': False, 'input_info': False, 'input_modality': 'image'}
- IMAGE_CLASSIFICATION = {'class_labels': True, 'input_info': False, 'input_modality': 'image'}
- IMAGE_EMBEDDING = {'class_labels': False, 'input_info': False, 'input_modality': 'image'}
- IMAGE_GENERATION = {'class_labels': False, 'input_info': False, 'input_modality': 'image'}
- IMAGE_SEGMENTATION = {'class_labels': True, 'input_info': False, 'input_modality': 'image'}
- OBJECT_DETECTION = {'class_labels': True, 'input_info': False, 'input_modality': 'image'}
- ORIENTED_OBJECT_DETECTION = {'class_labels': True, 'input_info': False, 'input_modality': 'image'}
- OTHER_COMPUTER_VISION = {'class_labels': False, 'input_info': False, 'input_modality': 'image'}
- OTHER_NATURAL_LANGUAGE = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
- OTHER_STRUCTURED_DATA = {'class_labels': False, 'input_info': True, 'input_modality': 'tabular'}
- QUESTION_ANSWER = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
- STRUCTURED_DATA_CLASSIFICATION = {'class_labels': True, 'input_info': True, 'input_modality': 'tabular'}
- STRUCTURED_DATA_REGRESSION = {'class_labels': False, 'input_info': True, 'input_modality': 'tabular'}
- SUMMARIZATION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
- TEXT2TEXT_GENERATION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
- TEXT_CLASSIFICATION = {'class_labels': True, 'input_info': False, 'input_modality': 'text'}
- TEXT_EMBEDDING = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
- TEXT_FILL_MASK = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
- TEXT_GENERATION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
- TOKEN_CLASSIFICATION = {'class_labels': True, 'input_info': False, 'input_modality': 'text'}
- TRANSLATION = {'class_labels': False, 'input_info': False, 'input_modality': 'text'}
chariot.models.evaluations module
chariot.models.inference module
- class chariot.models.inference.Action(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
StrEnum
- CHAT = 'chat'
- COMPLETE = 'complete'
- DETECT = 'detect'
- EMBED = 'embed'
- PREDICT = 'predict'
- PREDICT_PROBA = 'predict_proba'
- RECONSTRUCT = 'reconstruct'
chariot.models.isvc_settings module
- class chariot.models.isvc_settings.GPUDict[source]
Bases:
TypedDict
The number and type of GPU’s allocated to a model’s inference server.
- count: int
- product: str
- class chariot.models.isvc_settings.InferenceServerSettingsDict[source]
Bases:
TypedDict
Settings for a model’s inference server.
- enable_cvm_scoring: bool
Whether to enable Cramér–von Mises scoring of each request. Defaults to
False
. Embeddings will automatically be computed if this is enabled.
- enable_data_storage: bool
Whether to enable data (e.g. image) storage when storing inferences. Defaults to
False
.
- enable_inference_storage: bool
Whether to store inferences. Defaults to
False
.
- enable_ks_scoring: bool
Whether to enable Kolmogorov–Smirnov scoring of each request. Defaults to
False
. Embeddings will automatically be computed if this is enabled.
- enable_metadata_extraction: bool
Whether to enable metadata extraction when storing inferences. Defaults to
False
.
- enable_semantic_scoring: bool
Whether to enable semantic scoring of each request. Defaults to
False
. Embeddings will automatically be computed if this is enabled.
- huggingface_model_kwargs: dict | None
Model keyword arguments to use for the Huggingface inference engine. Defaults to
None
. Only used wheninference_engine="Huggingface"
.
- inference_engine: InferenceEngine | None
The inference engine to use. User selectable runtimes enable models to run under a different inference engine than the artifact type. The model must have been converted to that runtime with models export first. Passing nothing for this will result in running as the artifact type. Defaults to
None
.
- max_batch_delay_seconds: int | float
Maximum batch delay in seconds for triggering a prediction. Must be
>= 0
. Defaults to0
.
- max_batch_size: int
Maximum batch size for triggering a prediction. Must be
> 0
. Defaults to1
.
- negative_sampling_rate: float
Rate at which inferences without detections will be stored. Must be
>= 0
and<= 1
. Defaults to0
. A value of0.65
means that there is a 65% chance that each inference without a detection is stored.
- num_workers: int
Number of workers to use for the predictor. Must be
>= 1
and<= 100
. Defaults to1
. For artifact_type=Pytorch this value setsminWorkers
,maxWorkers
,default_workers_per_model
to the specified value. See https://pytorch.org/serve/configuration.html for more detail. For Chariot artifact types, sets the MLServerparallel_workers
field to the specified value. See https://mlserver.readthedocs.io/en/latest/user-guide/parallel-inference.html for more details.
- only_store_detections: bool
Whether to store all inferences (
False
) or only inferences with detections (True
). Defaults toFalse
. Ignored ifenable_inference_storage
isFalse
.
- positive_sampling_rate: float
Rate at which inferences with classifications or detections will be stored. Must be
>= 0
and<= 1
. Defaults to0
. A value of0.65
means that there is a 65% chance that each inference with a classification or detection is stored.
- predictor_cpu: str
Number of CPU cores allocated to the predictor. Must be a positive k8s quantity. Defaults to
"1"
. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.
- predictor_cpu_burstable: bool
Whether the predictor can burst to using more CPU than requested. Defaults to False.
- predictor_ephemeral_storage: str | None
Amount of ephemeral (disk) storage allocated to the predictor. Must be
None
or a positive k8s quantity. Defaults toNone
. IfNone
, no requests or limits are set. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#setting-requests-and-limits-for-local-ephemeral-storage for more detail.
- predictor_gpu: GPUDict | None
Number and type of GPUs allocated to the predictor. Defaults to
None
.
- predictor_include_embedding_model: bool
Whether to include the embedding model in the predictor. Defaults to
False
.
- predictor_max_replicas: int
Maximum number of predictor replicas to scale up to. Must be
>= 1
and<= ReplicaLimit
. Defaults to1
.
- predictor_memory: str
Amount of memory allocated to the predictor. Must be a positive k8s quantity. Defaults to
"4Gi"
. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.
- predictor_min_replicas: int
Minimum number of predictor replicas to scale down to. Must be
>= 0
and<= ReplicaLimit
. Defaults to0
.
- predictor_scale_metric: Literal['concurrency', 'rps']
Which metric to use for autoscaling the predictor. Defaults to
"concurrency"
. Valid values: -"concurrency"
: number of simultaneous requests to each replica. -"rps"
: number of requests per second.
- predictor_scale_target: int
Target value for autoscaling the predictor. Must be a positive integer. Defaults to
5
.
- scale_down_delay_seconds: int
The amount of time to wait after the scale metric falls below the scale target before scaling down if
min_replicas
has not been reached. Must be>= 0
and<= 3600
. Defaults to600
.
- transformer_cpu: str
Number of CPU cores allocated to the transformer. Must be a positive k8s quantity. Defaults to
"1"
. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.
- transformer_cpu_burstable: bool
Whether the transformer can burst to using more CPU than requested. Defaults to False.
- transformer_max_replicas: int
Maximum number of transformer replicas to scale up to. Must be
>= 1
and<= ReplicaLimit
. Defaults to1
.
- transformer_memory: str
Amount of memory allocated to the transformer. Must be a positive k8s quantity. Defaults to
"2Gi"
. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.
- transformer_min_replicas: int
Minimum number of transformer replicas to scale down to. Must be
>= 0
and<= ReplicaLimit
. Defaults to0
.
- transformer_scale_metric: Literal['concurrency', 'rps']
Which metric to use for autoscaling the transformer. Defaults to
"concurrency"
. -"concurrency"
: number of simultaneous requests to each replica. -"rps"
: number of requests per second.
- transformer_scale_target: int
Target value for autoscaling the transformer. Must be a positive integer. Defaults to
20
.
- vllm_configuration: VLLMConfigurationDict | None
The configuration for the vLLM inference engine. Defaults to
None
. Only used wheninference_engine="vLLM"
.
- class chariot.models.isvc_settings.IsvcSetting(key: str, value: Any, user_id: str, since: datetime.datetime, until: datetime.datetime | None = None)[source]
Bases:
object
- key: str
- since: datetime
- until: datetime | None = None
- user_id: str
- value: Any
- class chariot.models.isvc_settings.VLLMConfigurationDict[source]
Bases:
TypedDict
The configuration for the vLLM inference engine.
- bitsandbytes_4bit: bool
- enable_prefix_caching: bool
- max_model_length: int
- seed: int
- chariot.models.isvc_settings.create_isvc_settings(model_id: str, settings: dict[str, Any]) list[IsvcSetting] [source]
Create settings for the isvc of this model.
NOTE: This function is deprecated and will be removed in a future release. Please use the
chariot.models.model.Model.set_inference_server_settings()
method instead.
- chariot.models.isvc_settings.get_inference_server_settings(model_id: str) InferenceServerSettingsDict [source]
Get the current inference server settings for model
model_id
.Parameters
- model_id: str
The model to get inference server settings from.
Returns
- chariot.models.isvc_settings.get_isvc_settings(model_id: str, key: str | None = None) list[IsvcSetting] [source]
Get settings for the isvc of this model.
NOTE: This function is deprecated and will be removed in a future release. Please use the
chariot.models.model.Model.get_inference_server_settings()
method instead.
- chariot.models.isvc_settings.set_inference_server_settings(model_id: str, settings: InferenceServerSettingsDict)[source]
Set inference server settings for model
model_id
.Parameters
- model_id: str
The model to set inference server settings on.
- settings:
chariot.models.model.InferenceServerSettingsDict
Settings to apply to the inference server.
chariot.models.model module
- class chariot.models.model.GPUDict[source]
Bases:
TypedDict
The number and type of GPU’s allocated to a model’s inference server.
- count: int
- product: str
- class chariot.models.model.InferenceServerSettingsDict[source]
Bases:
TypedDict
Settings for a model’s inference server.
- enable_cvm_scoring: bool
Whether to enable Cramér–von Mises scoring of each request. Defaults to
False
. Embeddings will automatically be computed if this is enabled.
- enable_data_storage: bool
Whether to enable data (e.g. image) storage when storing inferences. Defaults to
False
.
- enable_inference_storage: bool
Whether to store inferences. Defaults to
False
.
- enable_ks_scoring: bool
Whether to enable Kolmogorov–Smirnov scoring of each request. Defaults to
False
. Embeddings will automatically be computed if this is enabled.
- enable_metadata_extraction: bool
Whether to enable metadata extraction when storing inferences. Defaults to
False
.
- enable_semantic_scoring: bool
Whether to enable semantic scoring of each request. Defaults to
False
. Embeddings will automatically be computed if this is enabled.
- huggingface_model_kwargs: dict | None
Model keyword arguments to use for the Huggingface inference engine. Defaults to
None
. Only used wheninference_engine="Huggingface"
.
- inference_engine: InferenceEngine | None
The inference engine to use. User selectable runtimes enable models to run under a different inference engine than the artifact type. The model must have been converted to that runtime with models export first. Passing nothing for this will result in running as the artifact type. Defaults to
None
.
- max_batch_delay_seconds: int | float
Maximum batch delay in seconds for triggering a prediction. Must be
>= 0
. Defaults to0
.
- max_batch_size: int
Maximum batch size for triggering a prediction. Must be
> 0
. Defaults to1
.
- negative_sampling_rate: float
Rate at which inferences without detections will be stored. Must be
>= 0
and<= 1
. Defaults to0
. A value of0.65
means that there is a 65% chance that each inference without a detection is stored.
- num_workers: int
Number of workers to use for the predictor. Must be
>= 1
and<= 100
. Defaults to1
. For artifact_type=Pytorch this value setsminWorkers
,maxWorkers
,default_workers_per_model
to the specified value. See https://pytorch.org/serve/configuration.html for more detail. For Chariot artifact types, sets the MLServerparallel_workers
field to the specified value. See https://mlserver.readthedocs.io/en/latest/user-guide/parallel-inference.html for more details.
- only_store_detections: bool
Whether to store all inferences (
False
) or only inferences with detections (True
). Defaults toFalse
. Ignored ifenable_inference_storage
isFalse
.
- positive_sampling_rate: float
Rate at which inferences with classifications or detections will be stored. Must be
>= 0
and<= 1
. Defaults to0
. A value of0.65
means that there is a 65% chance that each inference with a classification or detection is stored.
- predictor_cpu: str
Number of CPU cores allocated to the predictor. Must be a positive k8s quantity. Defaults to
"1"
. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.
- predictor_cpu_burstable: bool
Whether the predictor can burst to using more CPU than requested. Defaults to False.
- predictor_ephemeral_storage: str | None
Amount of ephemeral (disk) storage allocated to the predictor. Must be
None
or a positive k8s quantity. Defaults toNone
. IfNone
, no requests or limits are set. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#setting-requests-and-limits-for-local-ephemeral-storage for more detail.
- predictor_gpu: GPUDict | None
Number and type of GPUs allocated to the predictor. Defaults to
None
.
- predictor_include_embedding_model: bool
Whether to include the embedding model in the predictor. Defaults to
False
.
- predictor_max_replicas: int
Maximum number of predictor replicas to scale up to. Must be
>= 1
and<= ReplicaLimit
. Defaults to1
.
- predictor_memory: str
Amount of memory allocated to the predictor. Must be a positive k8s quantity. Defaults to
"4Gi"
. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.
- predictor_min_replicas: int
Minimum number of predictor replicas to scale down to. Must be
>= 0
and<= ReplicaLimit
. Defaults to0
.
- predictor_scale_metric: Literal['concurrency', 'rps']
Which metric to use for autoscaling the predictor. Defaults to
"concurrency"
. Valid values: -"concurrency"
: number of simultaneous requests to each replica. -"rps"
: number of requests per second.
- predictor_scale_target: int
Target value for autoscaling the predictor. Must be a positive integer. Defaults to
5
.
- scale_down_delay_seconds: int
The amount of time to wait after the scale metric falls below the scale target before scaling down if
min_replicas
has not been reached. Must be>= 0
and<= 3600
. Defaults to600
.
- transformer_cpu: str
Number of CPU cores allocated to the transformer. Must be a positive k8s quantity. Defaults to
"1"
. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.
- transformer_cpu_burstable: bool
Whether the transformer can burst to using more CPU than requested. Defaults to False.
- transformer_max_replicas: int
Maximum number of transformer replicas to scale up to. Must be
>= 1
and<= ReplicaLimit
. Defaults to1
.
- transformer_memory: str
Amount of memory allocated to the transformer. Must be a positive k8s quantity. Defaults to
"2Gi"
. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.
- transformer_min_replicas: int
Minimum number of transformer replicas to scale down to. Must be
>= 0
and<= ReplicaLimit
. Defaults to0
.
- transformer_scale_metric: Literal['concurrency', 'rps']
Which metric to use for autoscaling the transformer. Defaults to
"concurrency"
. -"concurrency"
: number of simultaneous requests to each replica. -"rps"
: number of requests per second.
- transformer_scale_target: int
Target value for autoscaling the transformer. Must be a positive integer. Defaults to
20
.
- vllm_configuration: VLLMConfigurationDict | None
The configuration for the vLLM inference engine. Defaults to
None
. Only used wheninference_engine="vLLM"
.
- class chariot.models.model.InferenceServerStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
StrEnum
The status of an inference server.
- ERROR = 'Error'
- MISCONFIGURED = 'Misconfigured'
- NOTINITIALIZED = 'Not initialized'
- NULL = 'null'
- PENDING = 'Pending'
- READY = 'Ready'
- SCALED_DOWN = 'Scaled down'
- STARTING = 'Starting'
- UNKNOWN = 'Unknown'
- UPDATING = 'Updating'
- class chariot.models.model.Model(project_id: str | None = None, id: str | None = None, project_name: str | None = None, subproject_name: str | None = None, organization_id: str | None = None, name: str | None = None, version: str | None = None, metadata: Any = None, start_server: bool = True)[source]
Bases:
Resource
- property actions: list[str]
- property architecture
- property class_labels
- property created_at: datetime
- default_action_methods = ['predict']
- fork(project_id: str, *_, name: str | None = None, summary: str | None = None, version: str | None = None) Model [source]
Fork the model.
Parameters
- project_idstr
The project to fork the model into.
- namestr, optional
Optional name override.
- summarystr, optional
Optional summary override.
- versionstr, optional
Optional model version override.
Returns
- Model
The new model fork.
- get_inference_server_settings() InferenceServerSettingsDict [source]
Get the current settings for this model’s inference server.
Returns
- infer(action: Action, sample: Any, timeout: int = 60, verbose: bool = False, url: str | None = None, custom_metadata: Mapping[str, str | float | int | Mapping[str, Any]] | Sequence[Mapping[str, str | float | int | Mapping[str, Any]]] | None = None, return_inference_id: bool = False, return_semantic_score: bool = False, score_threshold: float | None = None, **inference_kwargs) Any [source]
Run inference action on sample.
This method posts data to the model’s inference server and returns the results. The actions property lists the available actions for this model.
The inference response id is returned when return_inference_id is true. An inference request may or may not be batched, but it must contain at least one input. As such, if inference storage is enabled, a small modification to the returned id is necessary. The lookup pattern within the inference store is id-# where # represents the index of the inference request input. For example, if an inference request with a batch of two inputs is provided, appending -0 and -1 to the id to get each inference from the inference-store will be required.
- property inference_url: str | None
Url to inference server inference endpoint.
- property inverse_class_labels
- property isvc_settings
Get the current settings for this model’s inference server.
NOTE: This property is deprecated and will be removed in a future release. Please use the
get_inference_server_settings()
method instead.
- property name
- property name_slug
- set_inference_server_settings(settings: InferenceServerSettingsDict)[source]
Set settings for this model’s inference server.
Parameters
- settings:
chariot.models.model.InferenceServerSettingsDict
Settings to apply to the inference server.
- settings:
- start_inference_server(cpu: str | None = None, num_workers: int | None = None, memory: str | None = None, min_replicas: int | None = None, max_replicas: int | None = None, scale_metric: str | None = None, scale_target: int | None = None, scale_down_delay: int | str | None = None, gpu_count: int | None = None, gpu_type: str | None = None, edit_existing: bool | None = None, quantization_bits: int | None = None, huggingface_model_kwargs: dict | None = None, inference_engine: InferenceEngine | str | None = None, vllm_config: dict | None = None)[source]
Create an inference server for the model object.
NOTE: All parameters to this function are deprecated and will be removed in a future release. Please use the
chariot.models.model.Model.set_inference_server_settings()
method to configure the inference server.Deprecated Parameters
- cpu: str
Number of cpus allocated to inference server. This sets cpu requests and limits of the kubernetes pod. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu for more detail.
- num_workers: int
For artifact_type=Pytorch this value sets minWorkers, maxWorkers, default_workers_per_model to the specified value. See https://pytorch.org/serve/configuration.html for more detail. For all other artifact types, sets they MLServer parallel_workers field to the specified value. See https://mlserver.readthedocs.io/en/latest/user-guide/parallel-inference.html for more details.
- memory: str
Amount of memory allocated to the inference server. This sets the memory requests and limits of the kubernetes pod. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory for more detail.
- min_replicas: int
Minimum number of server replicas to scale down to. Defaults to 0.
- max_replicas: int
Maximum number of server replicas to scale up to. Defaults to 1.
- scale_metric: str
Metric to scale off of. Currently, only ‘concurrency’ and ‘rps’ are supported. Defaults to ‘concurrency’. ‘concurrency’: number of simultaneous open http connections. ‘rps’: number of http requests per second averaged over 60 seconds.
- scale_target: int
Threshold value that, once exceeded, will trigger a scale up event if max_replicas has not been reached.
- scale_down_delay: int | str | None
The amount of time to wait after the scale_metric falls below the scale_target before scaling down in min_replicas has not been reached. Can be in integer in seconds, a string with a number followed by ‘s’, ‘m’, or ‘h’ for seconds, minutes, or hours respectively, or None to use the default value.
- gpu_count: int
number of gpus requested
- gpu_type: str
Type of gpu to request. If set, gpu_count must be greater than 0
- edit_existing: bool
If inference server already exists, will update it to contain the setting specified in this function call
- quantization_bits: int
Passing this parameter will trigger quantization for huggingface models, only 4 or 8 is currently supported. This is passed as a model kwargs to the inference server.
- huggingface_model_kwargs: dict
Any parameter passed here will be passed as a model kwargs when creating a huggingface inference server.
- inference_engine: Optional[str]
The inference_engine to use, user selectable runtimes enables models to run under a different inference_engnine than the artifact type. The model must have been convert to that runtime with models export first. Passing nothing for this will result in running as the artifact type.
- vllm_config: Optional[dict]
The configuration for the vLLM inference engine. Only valid when inference_engine=”vLLM”. Please consult the Chariot docs for the available options for vLLM configs.
- property status
- property storage_status
- task_to_method = {'Automatic Speech Recognition': ['predict'], 'Conversational': ['chat'], 'Feature Extraction': ['predict'], 'Image Autoencoder': ['embed', 'reconstruct'], 'Image Classification': ['embed', 'predict', 'predict_proba'], 'Image Embedding': ['embed'], 'Image Generation': ['predict'], 'Image Segmentation': ['predict', 'predict_proba'], 'Object Detection': ['detect'], 'Oriented Object Detection': ['detect'], 'Other - Computer Vision': ['predict'], 'Other - Natural Language': ['predict'], 'Other - Structured Data': ['predict'], 'Question Answer': ['predict'], 'Structured Data Classification': ['predict'], 'Structured Data Regression': ['predict'], 'Summarization': ['predict'], 'Text Classification': ['predict'], 'Text Embedding': ['embed'], 'Text Fill-Mask': ['predict'], 'Text Generation': ['complete'], 'Text2Text Generation': ['predict'], 'Token Classification': ['predict'], 'Translation': ['predict']}
- update_isvc_settings(settings)[source]
Updatesettings for this model’s inference server.
NOTE: This method is deprecated and will be removed in a future release. Please use the
set_inference_server_settings()
method instead.
- property version
- wait_for_inference_server(timeout: int, verbose: bool = False, wait_interval: int = 1, internal_url: bool = False) OutputInferenceService [source]
Waits for the model’s dedicated inference server to be running and ready to accept requests. Will scale model up if it scaled to zero.
Parameters
- timeout:
Number in seconds to wait for inference server to spin up.
- verbose:
Whether to enable more verbose logging.
- wait_interval:
How many seconds to wait after each query of the get inference server status endpoint before trying again.
- internal_url:
Set to True to use inference within cluster.
Returns
The OutputInferenceService object for the inference service if it exists, None otherwise.
Raises
- ApiException:
If the call to get inference server status does not return a status code that is 2XX or 404.
- RuntimeError:
If the inference server failed to spin up or was not able to spin up within the timeout period.
- wait_for_upload(timeout=60, wait_interval=2) Self [source]
Wait for timeout seconds for this model to be uploaded.
Parameters
- timeout:
Number in seconds to wait for model upload.
- wait_interval:
How many seconds to wait after each query before trying again.
Returns
The Model object, for chaining
Raises
- ModelUploadTimeoutError:
If storage_status is not “uploaded” before timeout.
- exception chariot.models.model.ModelDoesNotExistError(model_name: str, version: str | None = None)[source]
Bases:
Exception
- class chariot.models.model.VLLMConfigurationDict[source]
Bases:
TypedDict
The configuration for the vLLM inference engine.
- bitsandbytes_4bit: bool
- enable_prefix_caching: bool
- max_model_length: int
- seed: int
- chariot.models.model.get_catalog(project_id: str, **kwargs) list[OutputModelSummary] [source]
get_catalog returns the model catalog matching the supplied keyword filters. See Chariot REST documentation for details.
Params
- project_idstr
project_id for models query
- chariot.models.model.get_model_by_id(id: str) Model [source]
get_model_by_id returns the model matching the supplied id.
chariot.models.stage module
- class chariot.models.stage.Stage(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
Possible Model Stages These values correspond to the ones defined in models-catalog https://github.com/Striveworks/chariot/blob/main/go/apps/models-catalog/pkg/constants/model_stage.go#L5-L8 # TODO consume these from json/yaml config file instead of hard coding.
- PRODUCTION = 'production'
- STAGING = 'staging'
- chariot.models.stage.get_active_stages(model: Model)[source]
Function to get the currently active model stages and a list of archived model versions. :param model:
chariot.models.upload module
- chariot.models.upload.import_model(*, name: str, version: str, summary: str, artifact_type: str | ArtifactType, task_type: str | TaskType, project_id: str | None = None, project_name: str | None = None, subproject_name: str | None = None, organization_id: str | None = None, class_labels: dict | None = None, model_object: Any | None = None, model_path: str | None = None, use_internal_url: bool = False, **kwargs) Model [source]
Import a local model into Chariot. For a previously exported Chariot model, model_path is the local path to the gzipped tar