chariot.training_v2 package

Submodules

chariot.training_v2.blueprint module

chariot.training_v2.blueprint.lookup_blueprint_id(name: str, version_ilike: str | None = None) → str[source]

Returns the id of the blueprint specified by the arguments

Parameters

namestr: name of the blueprint. if version_ilike is not provided, then the latest version of the blueprint with this name will be used.
version_ilikestr: this parameter accepts a SQL ILIKE pattern for matching a blueprint version. If multiple blueprints match the given version, the most recent one will be used.

Returns

str : id of the blueprint

Raises

BlueprintDoesNotExistError: If the blueprint does not exist or has been deleted, this will be raised.
ValueError: If name is not provided.
APIException: If api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.checkpoint module

Bases: BaseModelWithDatetime

bucket_name: str | None

create_model(*, name: str, version: str, summary: str, project_id: str | None = None) → str[source]

Create a model from this checkpoint

Parameters

namestr: The name to give the model
versionstr: The version to give the model. Must be in SemVer format
summarystr: A short summary of the model
project_idOptional[str]: The ID of the project to create the model in. If omitted, the project ID of the associated run will be used.

Returns

model_idstr: The ID of the created model

Raises

APIException: If api communication fails, request is unauthorized or is unauthenticated.

created_at: datetime | None

global_step: int | None

id: str | None

key_prefix: str | None

model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

project_id: str | None

run_id: str | None

status: str | None

status_updated_at: datetime | None

chariot.training_v2.checkpoint.create_model_from_checkpoint(*, checkpoint_id: str, name: str, version: str, summary: str, project_id: str | None = None) → str[source]

Create a model from a checkpoint

Parameters

checkpoint_idstr: The ID of the checkpoint to create the model from
namestr: The name to give the model
versionstr: The version to give the model. Must be in SemVer format
summarystr: A short summary of the model
project_idOptional[str]: The ID of the project to create the model in. If omitted, the project ID of the associated run will be used.

Returns

model_idstr: The ID of the created model

Raises

APIException: If api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.checkpoint.delete_checkpoints(*, ids: list[str] | None = None, run_ids: list[str] | None = None) → None[source]

Delete checkpoints matching the provided filters

Parameters

idOptional[List[str]]: If specified, filter to checkpoints with any of the given Checkpoint IDs. Note: either id or run_id must be specified, in order to prevent accidental deletion of all checkpoints.
run_idOptional[List[str]]: If specified, filter to checkpoints with any of the given Run IDs. Note: either id or run_id must be specified, in order to prevent accidental deletion of all checkpoints.

Raises

APIException: If api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.checkpoint.download_checkpoint(id: str, file_dir: str) → None[source]

Download checkpoint artifacts

Parameters

idstr: The ID of the checkpoint to download
file_dirstr: The file dir for the downloaded checkpoint artifacts, file dir must exist.

Returns

Raises

APIException: If api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.checkpoint.get_checkpoints(*, ids: list[str] | None = None, run_ids: list[str] | None = None, project_ids: list[str] | None = None, global_steps: list[int] | None = None, statuses: list[Literal['incomplete', 'complete']] | None = None, created_before: datetime | None = None, created_after: datetime | None = None, select: list[Literal['id', 'created_at', 'run_id', 'project_id', 'global_step', 'status', 'status_updated_at']] | None = None, sort: list[Literal['id:asc', 'id:desc', 'created_at:asc', 'created_at:desc']] | None = None, limit: int | None = None, offset: int | None = None) → list[Checkpoint][source]

Get checkpoints matching the provided filters

Parameters

idsOptional[List[str]]: If specified, filter to checkpoints with any of the given IDs
run_idsOptional[List[str]]: If specified, filter to checkpoints for any of the given Run Ids
project_idsOptional[List[str]]: If specified, filter to checkpoints with any of the given Project Ids
global_stepsOptional[List[int]]: If specified, filter to checkpoints from any of the given Global Steps
statusesOptional[List[Literal[“incomplete”, “complete”]]]: If specified, filter to checkpoints with a specific status.
created_beforeOptional[datetime]: If specified, filter to checkpoints created before the given date and time. This can be used for keyset pagination
created_afterOptional[datetime]: If specified, filter to checkpoints created after the given date and time. This can be used for keyset pagination
selectOptional[List[Literal[“id”, “created_at”, “run_id”,: “project_id”, “global_step”, “status”, “status_updated_at”]]] If specified, only the given fields are included in the response.
sortOptional[List[Literal[“id:asc”, “id:desc”, “created_at:asc”, “created_at:desc”]]]: Sort by the given fields in the given directions. Default: “created_at:desc”
limitOptional[int]: Limit the response to the given number of checkpoints. Default: 10
offsetOptional[int]: Offset based pagination. Default: 0

Returns

checkpointsList[Checkpoint]: The checkpoints matching the filter criteria

Raises

APIException: If api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.exceptions module

exception chariot.training_v2.exceptions.ApiException(status=None, reason=None, http_resp=None, *, body: str | None = None, data: Any | None = None)[source]

Bases: OpenApiException

classmethod from_response(*, http_resp, body: str | None, data: Any | None) → Self[source]

exception chariot.training_v2.exceptions.BlueprintDoesNotExistError(id: str | None = None, name: str | None = None, version_ilike: str | None = None)[source]: Bases: Exception

exception chariot.training_v2.exceptions.RunDoesNotExistError(run_id: str | None = None)[source]: Bases: Exception

exception chariot.training_v2.exceptions.ValidationError(errors: list[FieldError])[source]: Bases: Exception

chariot.training_v2.run module

Training run management.

class chariot.training_v2.run.Event(*, id: str, sequence: int, run_id: str, created_at: datetime, status: str, details: dict)[source]

Bases: BaseModelWithDatetime

Training run event.

created_at: datetime

details: dict

id: str

model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

run_id: str

sequence: int

status: str

class chariot.training_v2.run.Gpu(*, count: int, type: str)[source]

Bases: BaseModelWithDatetime

Gpu resource metadata.

All available gpu types can be found be calling the function chariot.system_resources.get_available_system_gpus.

count: int

model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: str

class chariot.training_v2.run.Metric(*, id: str, created_at: datetime, run_id: str, global_step: int, tag: str, value: float | int, job_id: str | None = None)[source]

Bases: BaseModelWithDatetime

Training run metric.

created_at: datetime

global_step: int

id: str

job_id: str | None

model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

run_id: str

tag: str

value: float | int

class chariot.training_v2.run.Progress(*, operation: str, value: float | int, final_value: float | int, units: str)[source]

Bases: BaseModelWithDatetime

Training run progress.

final_value: float | int

model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

operation: str

units: str

value: float | int

class chariot.training_v2.run.Resources(*, cpu: str, memory: str, ephemeral_storage: str | None = None, gpu: Gpu | None = None)[source]

Bases: BaseModelWithDatetime

Training run scheduling resources.

These values represent kubernetes resources that will be allocated for a training run.

Example values:: cpu: “1” cpu: “500m” memory: “5Gi” # gigabytes memory: “5000000Ki” # kilobytes ephemeral_storage: “20Gi”

Reference: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-units-in-kubernetes

cpu: str

ephemeral_storage: str | None

gpu: Gpu | None

memory: str

model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Bases: BaseModelWithDatetime

Training run.

Please use chariot.training_v2.run.Run.from_id() to get a run by id, or chariot.training_v2.run.get_runs() to lookup runs by name, version, etc.

Fields marked Optional should be included by default, but might be missing if a select filter is applied to chariot.training_v2.run.get_runs().

blueprint_id: str | None

config: dict | None

created_at: datetime | None

delete()[source]

Delete this run.

Raises

APIException: If api communication fails, request is unauthorized or is unauthenticated.

classmethod from_id(run_id: str) → Run[source]: Get a training run by id.

get_all_metrics() → list[Metric][source]

Get all metrics for this run.

Sort order is unspecified and may change in the future.

Returns

metrics: list[Metric]

Raises

APIException: If api communication fails, request is unauthorized or is unauthenticated.

get_checkpoints(*, ids: list[str] | None = None, project_ids: list[str] | None = None, global_steps: list[int] | None = None, statuses: list[Literal['incomplete', 'complete']] | None = None, created_before: datetime | None = None, created_after: datetime | None = None, select: list[Literal['id', 'created_at', 'run_id', 'project_id', 'global_step', 'status', 'status_updated_at']] | None = None, sort: list[Literal['id:asc', 'id:desc', 'created_at:asc', 'created_at:desc']] | None = None, limit: int | None = None, offset: int | None = None) → list[Checkpoint][source]

Get checkpoints for this run.

Parameters

ids: list[str] | None: If specified, filter to checkpoints with any of the given IDs
project_ids: list[str] | None: If specified, filter to checkpoints with any of the given Project Ids
global_steps: list[int] | None: If specified, filter to checkpoints from any of the given Global Steps
statuses: list[Literal[“incomplete”, “complete”]] | None: If specified, filter to checkpoints with a specific status.
created_before: datetime | None: If specified, filter to checkpoints created before the given date and time. This can be used for keyset pagination
created_after: datetime | None: If specified, filter to checkpoints created after the given date and time. This can be used for keyset pagination
select: list[Literal[ “id”, “created_at”, “run_id”, “project_id”, “global_step”, “status”, “status_updated_at”]] | None: If specified, only the given fields are included in the response.
sort: list[Literal[“id:asc”, “id:desc”, “created_at:asc”, “created_at:desc”]] | None: Sort by the given fields in the given directions. Default: “created_at:desc”
limit: int | None: Limit the response to the given number of checkpoints. Default: 10
offset: int | None: Offset based pagination. Default: 0

Returns

checkpoints: list[Checkpoint]: The checkpoints matching the filter criteria

Raises

APIException: If api communication fails, request is unauthorized or is unauthenticated.

get_events(limit: int | None = None, offset: int | None = None, sort: list[Literal['sequence:desc', 'sequence:asc']] | None = None) → list[Event][source]

Get events for this run.

Parameters

limit: int | None: Limit the response to the given number of run events. Defaults to 100.
offset: int | None: Offset based pagination. Defaults to 0.
sort: list[Literal[“sequence:desc”, “sequence:asc”]] | None: Sort by the given fields in the given directions. The field and direction should be separated by a colon. For example: sort=sequence:desc. Defaults to sequence:desc. Valid field is only sequence. Valid directions are ascending (asc) or descending (desc). If the direction is not specified it defaults to ascending (asc).

Raises

APIException: If api communication fails, request is unauthorized or is unauthenticated.

get_global_steps_with_checkpoints() → list[int][source]

Get the global steps for which a checkpoint exists for this run.

Returns

list[int]: global steps

Raises

APIException: If api communication fails, request is unauthorized or is unauthenticated.

get_metrics(global_steps: list[int] | None = None, tags: list[str] | None = None, limit: int = 1000, created_before: datetime | None = None) → list[Metric][source]

Get metrics for this run.

Parameters

global_steps: list[int] | None: if specified, only return metrics for these global steps
tags: list[str] | None: if specified, only return metrics with the given tags
limit: int (default: 1000): limit the response to the given number of metrics
created_before: datetime | None: if specified, filter to metrics created before the given date and time. This can be used for keyset pagination

Returns

metrics: list[Metric]

Raises

APIException: If api communication fails, request is unauthorized or is unauthenticated.

id: str | None

model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str | None

notes: str | None

progress: list[Progress] | None

progress_updated_at: datetime | None

project_id: str | None

reload(fields: list[str] | None = None)[source]

Reload the training run.

Parameters

fields: list[str] | None: List of fields to reload. Options are “status”, “status_updated_at”, “progress”, “progress_updated_at”, and “notes”. If omitted, all fields will be refreshed.

Raises

RunDoesNotExistError: If the run does not exist or has been deleted, this will be raised.
ValueError: If fields are invalid, or the id is not set on this run.
APIException: If api communication fails, request is unauthorized or is unauthenticated.

resources: Resources | None

restart(resources: Resources = None) → None[source]

status: str | None

status_updated_at: datetime | None

stop(grace_period: timedelta | None = None) → None[source]

Stop the training run.

Parameters

grace_period: timedelta | None: Time that will be tolerated before the run should be force stopped. Must be greater than or equal to 1 second. If not provided, will default to 10 minutes

Raises

APIException: If api communication fails, request is unauthorized or is unauthenticated.

task_type: str | None

user_id: str | None

version: str | None

chariot.training_v2.run.create_run(*, name: str, version: str, resources: Resources, config: dict, task_type: str, blueprint_id: str, project_id: str, notes: str | None = None) → str[source]

Create a training run.

Parameters

name: str: name of the run
version: str: version of the run
resources: Resources: resources to allocate for scheduling the run
config: dict: the run config
task_type: str: task type of the run
project_id: str: the id of the project to create the run in. To lookup a project id by name, use chariot.projects.get_project_id.
blueprint_id: str: the id of the blueprint to use. To lookup a blueprint id by name, use chariot.training_v2.lookup_blueprint_id.
notes: str, optional: notes associated with the training run

Returns

run_id: str: the created run’s id

Raises

ValidationError: if the provided run config is invalid according to the blueprint, or any required parameters are ill-formed.
APIException: if api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.run.get_runs(*, blueprint_ids: list[str] | None = None, created_after: datetime | None = None, created_before: datetime | None = None, ids: list[str] | None = None, id_after: str | None = None, limit: int | None = None, offset: int | None = None, name_ilikes: list[str] | None = None, project_ids: list[str] | None = None, select: list[Literal['*', 'id', 'project_id', 'user_id', 'created_at', 'name', 'version', 'blueprint_id', 'task_type', 'config', 'resources', 'progress', 'progress_updated_at', 'status', 'status_updated_at']] | None = None, sort: list[Literal['id:asc', 'id:desc', 'created_at:asc', 'created_at:desc']] | None = None, statuses: list[Literal['run_created', 'run_stop_requested', 'run_restart_requested', 'job_create_failed', 'job_created', 'job_submitted', 'job_pending', 'job_running', 'job_terminate_requested', 'job_terminating', 'job_terminated', 'job_failed', 'job_completed', 'job_unknown']] | None = None, task_types: list[str] | None = None, versions: list[str] | None = None, user_ids: list[str] | None = None) → list[Run][source]

Get runs matching the provided critera

Parameters

blueprint_ids: list[str] | None: If specified, filter to runs with any of the given Blueprint IDs
created_after: datetime | None: If specified, filter to runs created after the given date and time. This can be used for keyset pagination
created_before: datetime | None: If specified, filter to runs created before the given date and time. This can be used for keyset pagination
ids: list[str] | None: If specified, filter to runs with any of the given IDs.
id_after: str | None: If specified, filter to runs with an ID after the given ID. This can be used for keyset pagination.
limit: int | None: Limit the response to the given number of runs. Default: 10
offset: int | None: Offset based pagination. Default: 0
name_ilikes: list[str] | None: If specified, filter to runs with a name that matches any of the given SQL ILIKE patterns. Options for pattern matching are: % matches any sequence of zero or more characters. _ matches any single character. To match the literal characters % or _, escape the character with a \, e.g. \%testrun To use equality matching, simply provide a plain string with no special characters. Matching is case insensitive. For example: The pattern %test-run% matches test-run, FOOtest-runBAR, and test-runBAR. The pattern \%test_run matches %test9run and %test_run, but not FOOtest_run, %test__run, or %test_runBAR. The pattern test\_run matches test_run and nothing else.
project_ids: list[str] | None: If specified, filter to runs with any of the given Project IDs. To lookup a project id by name, use chariot.projects.get_project_id
select: list[Literal[“id”, “project_id”, “user_id”, “created_at”, “name”, “version”, “blueprint_id”, “task_type”, “config”, “resources”, “progress”, “progress_updated_at”, “status”, “status_updated_at”]]] | None: If specified, only the selected fields are included in the response. If all fields are desired, use “*”. Excluded attributes will be None in the chariot.training_v2.Run responses.
sort: list[Literal[“id:asc”, “id:desc”, “created_at:asc”, “created_at:desc”]]] | None: Sort by the given fields in the given directions. The field and direction should be separated by a colon. Default: "created_at:desc"
statuses: list[Literal[“run_created”, “run_stop_requested”, “run_restart_requested”, “job_create_failed”, “job_created”, “job_submitted”, “job_pending”, “job_running”, “job_terminate_requested”, “job_terminating”, “job_terminated”, “job_failed”, “job_completed”, “job_unknown”]] | None: If specified, filter to runs with any of the given statuses.
task_types: list[str] | None: If specified, filter to runs with any of the given Task Types. Examples: "Object Detection", "Image Segmentation"
versions: list[str] | None: If specified, filter to runs with any of the given Versions.
user_ids: list[str] | None: If specified, filter to runs with any of the given User IDs.

Returns

list[Run]: Runs matching the filter criteria

Raises

APIException: If api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.run.validate_run_config(*, blueprint_id: str, config: dict)[source]

Validate a training run configuration against the provided blueprint id.

Parameters

blueprint_id: str: The blueprint to validate against
config: dict: The run configuration to validate

Raises

ValidationError: if the provided run config is invalid
APIException: if api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2 package

Submodules

chariot.training_v2.blueprint module

Parameters

Returns

Raises

chariot.training_v2.checkpoint module

Parameters

Returns

Raises

Parameters

Returns

Raises

Parameters

Raises

Parameters

Returns

Raises

Parameters

Returns

Raises

chariot.training_v2.exceptions module

chariot.training_v2.run module

Raises

Returns

Raises

Parameters

Returns

Raises

Parameters

Raises

Returns

Raises

Parameters

Returns

Raises

Parameters

Raises

Parameters

Raises

Parameters

Returns

Raises

Parameters

Returns

Raises

Parameters

Raises

Module contents