chariot.training_v2 package

Submodules

chariot.training_v2.blueprint module

chariot.training_v2.blueprint.lookup_blueprint_id(name: str, version_ilike: str | None = None) str[source]

Returns the id of the blueprint specified by the arguments

Parameters

namestr

name of the blueprint. if version_ilike is not provided, then the latest version of the blueprint with this name will be used.

version_ilikestr

this parameter accepts a SQL ILIKE pattern for matching a blueprint version. If multiple blueprints match the given version, the most recent one will be used.

Returns

str : id of the blueprint

Raises

BlueprintDoesNotExistError

If the blueprint does not exist or has been deleted, this will be raised.

ValueError

If name is not provided.

APIException

If api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.checkpoint module

class chariot.training_v2.checkpoint.Checkpoint(*, id: str | None = None, run_id: str | None = None, global_step: int | None = None, project_id: str | None = None, created_at: datetime | None = None, status: str | None = None, status_updated_at: datetime | None = None, bucket_name: str | None = None, key_prefix: str | None = None)[source]

Bases: BaseModelWithDatetime

bucket_name: str | None
create_model(*, name: str, version: str, summary: str, project_id: str | None = None) str[source]

Create a model from this checkpoint

Parameters

namestr

The name to give the model

versionstr

The version to give the model. Must be in SemVer format

summarystr

A short summary of the model

project_idOptional[str]

The ID of the project to create the model in. If omitted, the project ID of the associated run will be used.

Returns

model_idstr

The ID of the created model

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

created_at: datetime | None
global_step: int | None
id: str | None
key_prefix: str | None
model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

project_id: str | None
run_id: str | None
status: str | None
status_updated_at: datetime | None
chariot.training_v2.checkpoint.create_model_from_checkpoint(*, checkpoint_id: str, name: str, version: str, summary: str, project_id: str | None = None) str[source]

Create a model from a checkpoint

Parameters

checkpoint_idstr

The ID of the checkpoint to create the model from

namestr

The name to give the model

versionstr

The version to give the model. Must be in SemVer format

summarystr

A short summary of the model

project_idOptional[str]

The ID of the project to create the model in. If omitted, the project ID of the associated run will be used.

Returns

model_idstr

The ID of the created model

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.checkpoint.delete_checkpoints(*, ids: list[str] | None = None, run_ids: list[str] | None = None) None[source]

Delete checkpoints matching the provided filters

Parameters

idOptional[List[str]]

If specified, filter to checkpoints with any of the given Checkpoint IDs. Note: either id or run_id must be specified, in order to prevent accidental deletion of all checkpoints.

run_idOptional[List[str]]

If specified, filter to checkpoints with any of the given Run IDs. Note: either id or run_id must be specified, in order to prevent accidental deletion of all checkpoints.

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.checkpoint.download_checkpoint(id: str, file_dir: str) None[source]

Download checkpoint artifacts

Parameters

idstr

The ID of the checkpoint to download

file_dirstr

The file dir for the downloaded checkpoint artifacts, file dir must exist.

Returns

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.checkpoint.get_checkpoints(*, ids: list[str] | None = None, run_ids: list[str] | None = None, project_ids: list[str] | None = None, global_steps: list[int] | None = None, statuses: list[Literal['incomplete', 'complete']] | None = None, created_before: datetime | None = None, created_after: datetime | None = None, select: list[Literal['id', 'created_at', 'run_id', 'project_id', 'global_step', 'status', 'status_updated_at']] | None = None, sort: list[Literal['id:asc', 'id:desc', 'created_at:asc', 'created_at:desc']] | None = None, limit: int | None = None, offset: int | None = None) list[Checkpoint][source]

Get checkpoints matching the provided filters

Parameters

idsOptional[List[str]]

If specified, filter to checkpoints with any of the given IDs

run_idsOptional[List[str]]

If specified, filter to checkpoints for any of the given Run Ids

project_idsOptional[List[str]]

If specified, filter to checkpoints with any of the given Project Ids

global_stepsOptional[List[int]]

If specified, filter to checkpoints from any of the given Global Steps

statusesOptional[List[Literal[“incomplete”, “complete”]]]

If specified, filter to checkpoints with a specific status.

created_beforeOptional[datetime]

If specified, filter to checkpoints created before the given date and time. This can be used for keyset pagination

created_afterOptional[datetime]

If specified, filter to checkpoints created after the given date and time. This can be used for keyset pagination

selectOptional[List[Literal[“id”, “created_at”, “run_id”,

“project_id”, “global_step”, “status”, “status_updated_at”]]] If specified, only the given fields are included in the response.

sortOptional[List[Literal[“id:asc”, “id:desc”, “created_at:asc”, “created_at:desc”]]]

Sort by the given fields in the given directions. Default: “created_at:desc”

limitOptional[int]

Limit the response to the given number of checkpoints. Default: 10

offsetOptional[int]

Offset based pagination. Default: 0

Returns

checkpointsList[Checkpoint]

The checkpoints matching the filter criteria

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.exceptions module

exception chariot.training_v2.exceptions.ApiException(status=None, reason=None, http_resp=None, *, body: str | None = None, data: Any | None = None)[source]

Bases: OpenApiException

classmethod from_response(*, http_resp, body: str | None, data: Any | None) Self[source]
exception chariot.training_v2.exceptions.BlueprintDoesNotExistError(id: str | None = None, name: str | None = None, version_ilike: str | None = None)[source]

Bases: Exception

exception chariot.training_v2.exceptions.RunDoesNotExistError(run_id: str | None = None)[source]

Bases: Exception

exception chariot.training_v2.exceptions.ValidationError(errors: list[FieldError])[source]

Bases: Exception

chariot.training_v2.run module

Training run management.

class chariot.training_v2.run.Event(*, id: str, sequence: int, run_id: str, created_at: datetime, status: str, details: dict)[source]

Bases: BaseModelWithDatetime

Training run event.

created_at: datetime
details: dict
id: str
model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

run_id: str
sequence: int
status: str
class chariot.training_v2.run.Gpu(*, count: int, type: str)[source]

Bases: BaseModelWithDatetime

Gpu resource metadata.

All available gpu types can be found be calling the function chariot.system_resources.get_available_system_gpus.

count: int
model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: str
class chariot.training_v2.run.Metric(*, id: str, created_at: datetime, run_id: str, global_step: int, tag: str, value: float | int, job_id: str | None = None)[source]

Bases: BaseModelWithDatetime

Training run metric.

created_at: datetime
global_step: int
id: str
job_id: str | None
model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

run_id: str
tag: str
value: float | int
class chariot.training_v2.run.Progress(*, operation: str, value: float | int, final_value: float | int, units: str)[source]

Bases: BaseModelWithDatetime

Training run progress.

final_value: float | int
model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

operation: str
units: str
value: float | int
class chariot.training_v2.run.Resources(*, cpu: str, memory: str, ephemeral_storage: str | None = None, gpu: Gpu | None = None)[source]

Bases: BaseModelWithDatetime

Training run scheduling resources.

These values represent kubernetes resources that will be allocated for a training run.

Example values:

cpu: “1” cpu: “500m” memory: “5Gi” # gigabytes memory: “5000000Ki” # kilobytes ephemeral_storage: “20Gi”

Reference: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-units-in-kubernetes

cpu: str
ephemeral_storage: str | None
gpu: Gpu | None
memory: str
model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class chariot.training_v2.run.Run(*, id: str | None = None, name: str | None = None, version: str | None = None, created_at: datetime | None = None, blueprint_id: str | None = None, project_id: str | None = None, user_id: str | None = None, progress: list[Progress] | None = None, progress_updated_at: datetime | None = None, status: str | None = None, status_updated_at: datetime | None = None, task_type: str | None = None, resources: Resources | None = None, config: dict | None = None, notes: str | None = None)[source]

Bases: BaseModelWithDatetime

Training run.

Please use chariot.training_v2.run.Run.from_id() to get a run by id, or chariot.training_v2.run.get_runs() to lookup runs by name, version, etc.

Fields marked Optional should be included by default, but might be missing if a select filter is applied to chariot.training_v2.run.get_runs().

blueprint_id: str | None
config: dict | None
created_at: datetime | None
delete()[source]

Delete this run.

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

classmethod from_id(run_id: str) Run[source]

Get a training run by id.

get_all_metrics() list[Metric][source]

Get all metrics for this run.

Sort order is unspecified and may change in the future.

Returns

metrics: list[Metric]

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

get_checkpoints(*, ids: list[str] | None = None, project_ids: list[str] | None = None, global_steps: list[int] | None = None, statuses: list[Literal['incomplete', 'complete']] | None = None, created_before: datetime | None = None, created_after: datetime | None = None, select: list[Literal['id', 'created_at', 'run_id', 'project_id', 'global_step', 'status', 'status_updated_at']] | None = None, sort: list[Literal['id:asc', 'id:desc', 'created_at:asc', 'created_at:desc']] | None = None, limit: int | None = None, offset: int | None = None) list[Checkpoint][source]

Get checkpoints for this run.

Parameters

ids: list[str] | None

If specified, filter to checkpoints with any of the given IDs

project_ids: list[str] | None

If specified, filter to checkpoints with any of the given Project Ids

global_steps: list[int] | None

If specified, filter to checkpoints from any of the given Global Steps

statuses: list[Literal[“incomplete”, “complete”]] | None

If specified, filter to checkpoints with a specific status.

created_before: datetime | None

If specified, filter to checkpoints created before the given date and time. This can be used for keyset pagination

created_after: datetime | None

If specified, filter to checkpoints created after the given date and time. This can be used for keyset pagination

select: list[Literal[ “id”, “created_at”, “run_id”, “project_id”, “global_step”, “status”, “status_updated_at”]] | None

If specified, only the given fields are included in the response.

sort: list[Literal[“id:asc”, “id:desc”, “created_at:asc”, “created_at:desc”]] | None

Sort by the given fields in the given directions. Default: “created_at:desc”

limit: int | None

Limit the response to the given number of checkpoints. Default: 10

offset: int | None

Offset based pagination. Default: 0

Returns

checkpoints: list[Checkpoint]

The checkpoints matching the filter criteria

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

get_events(limit: int | None = None, offset: int | None = None, sort: list[Literal['sequence:desc', 'sequence:asc']] | None = None) list[Event][source]

Get events for this run.

Parameters

limit: int | None

Limit the response to the given number of run events. Defaults to 100.

offset: int | None

Offset based pagination. Defaults to 0.

sort: list[Literal[“sequence:desc”, “sequence:asc”]] | None

Sort by the given fields in the given directions. The field and direction should be separated by a colon. For example: sort=sequence:desc. Defaults to sequence:desc. Valid field is only sequence. Valid directions are ascending (asc) or descending (desc). If the direction is not specified it defaults to ascending (asc).

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

get_global_steps_with_checkpoints() list[int][source]

Get the global steps for which a checkpoint exists for this run.

Returns

list[int]: global steps

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

get_metrics(global_steps: list[int] | None = None, tags: list[str] | None = None, limit: int = 1000, created_before: datetime | None = None) list[Metric][source]

Get metrics for this run.

Parameters

global_steps: list[int] | None

if specified, only return metrics for these global steps

tags: list[str] | None

if specified, only return metrics with the given tags

limit: int (default: 1000)

limit the response to the given number of metrics

created_before: datetime | None

if specified, filter to metrics created before the given date and time. This can be used for keyset pagination

Returns

metrics: list[Metric]

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

id: str | None
model_config: ClassVar[ConfigDict] = {'protected_namespaces': (), 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str | None
notes: str | None
progress: list[Progress] | None
progress_updated_at: datetime | None
project_id: str | None
reload(fields: list[str] | None = None)[source]

Reload the training run.

Parameters

fields: list[str] | None

List of fields to reload. Options are “status”, “status_updated_at”, “progress”, “progress_updated_at”, and “notes”. If omitted, all fields will be refreshed.

Raises

RunDoesNotExistError

If the run does not exist or has been deleted, this will be raised.

ValueError

If fields are invalid, or the id is not set on this run.

APIException

If api communication fails, request is unauthorized or is unauthenticated.

resources: Resources | None
restart(resources: Resources = None) None[source]
status: str | None
status_updated_at: datetime | None
stop(grace_period: timedelta | None = None) None[source]

Stop the training run.

Parameters

grace_period: timedelta | None

Time that will be tolerated before the run should be force stopped. Must be greater than or equal to 1 second. If not provided, will default to 10 minutes

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

task_type: str | None
user_id: str | None
version: str | None
chariot.training_v2.run.create_run(*, name: str, version: str, resources: Resources, config: dict, task_type: str, blueprint_id: str, project_id: str, notes: str | None = None) str[source]

Create a training run.

Parameters

name: str

name of the run

version: str

version of the run

resources: Resources

resources to allocate for scheduling the run

config: dict

the run config

task_type: str

task type of the run

project_id: str

the id of the project to create the run in. To lookup a project id by name, use chariot.projects.get_project_id.

blueprint_id: str

the id of the blueprint to use. To lookup a blueprint id by name, use chariot.training_v2.lookup_blueprint_id.

notes: str, optional

notes associated with the training run

Returns

run_id: str

the created run’s id

Raises

ValidationError

if the provided run config is invalid according to the blueprint, or any required parameters are ill-formed.

APIException

if api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.run.get_runs(*, blueprint_ids: list[str] | None = None, created_after: datetime | None = None, created_before: datetime | None = None, ids: list[str] | None = None, id_after: str | None = None, limit: int | None = None, offset: int | None = None, name_ilikes: list[str] | None = None, project_ids: list[str] | None = None, select: list[Literal['*', 'id', 'project_id', 'user_id', 'created_at', 'name', 'version', 'blueprint_id', 'task_type', 'config', 'resources', 'progress', 'progress_updated_at', 'status', 'status_updated_at']] | None = None, sort: list[Literal['id:asc', 'id:desc', 'created_at:asc', 'created_at:desc']] | None = None, statuses: list[Literal['run_created', 'run_stop_requested', 'run_restart_requested', 'job_create_failed', 'job_created', 'job_submitted', 'job_pending', 'job_running', 'job_terminate_requested', 'job_terminating', 'job_terminated', 'job_failed', 'job_completed', 'job_unknown']] | None = None, task_types: list[str] | None = None, versions: list[str] | None = None, user_ids: list[str] | None = None) list[Run][source]

Get runs matching the provided critera

Parameters

blueprint_ids: list[str] | None

If specified, filter to runs with any of the given Blueprint IDs

created_after: datetime | None

If specified, filter to runs created after the given date and time. This can be used for keyset pagination

created_before: datetime | None

If specified, filter to runs created before the given date and time. This can be used for keyset pagination

ids: list[str] | None

If specified, filter to runs with any of the given IDs.

id_after: str | None

If specified, filter to runs with an ID after the given ID. This can be used for keyset pagination.

limit: int | None

Limit the response to the given number of runs. Default: 10

offset: int | None

Offset based pagination. Default: 0

name_ilikes: list[str] | None

If specified, filter to runs with a name that matches any of the given SQL ILIKE patterns. Options for pattern matching are: % matches any sequence of zero or more characters. _ matches any single character. To match the literal characters % or _, escape the character with a \, e.g. \%testrun To use equality matching, simply provide a plain string with no special characters. Matching is case insensitive. For example: The pattern %test-run% matches test-run, FOOtest-runBAR, and test-runBAR. The pattern \%test_run matches %test9run and %test_run, but not FOOtest_run, %test__run, or %test_runBAR. The pattern test\_run matches test_run and nothing else.

project_ids: list[str] | None

If specified, filter to runs with any of the given Project IDs. To lookup a project id by name, use chariot.projects.get_project_id

select: list[Literal[“id”, “project_id”, “user_id”, “created_at”, “name”, “version”, “blueprint_id”, “task_type”, “config”, “resources”, “progress”, “progress_updated_at”, “status”, “status_updated_at”]]] | None

If specified, only the selected fields are included in the response. If all fields are desired, use “*”. Excluded attributes will be None in the chariot.training_v2.Run responses.

sort: list[Literal[“id:asc”, “id:desc”, “created_at:asc”, “created_at:desc”]]] | None

Sort by the given fields in the given directions. The field and direction should be separated by a colon. Default: "created_at:desc"

statuses: list[Literal[“run_created”, “run_stop_requested”, “run_restart_requested”, “job_create_failed”, “job_created”, “job_submitted”, “job_pending”, “job_running”, “job_terminate_requested”, “job_terminating”, “job_terminated”, “job_failed”, “job_completed”, “job_unknown”]] | None

If specified, filter to runs with any of the given statuses.

task_types: list[str] | None

If specified, filter to runs with any of the given Task Types. Examples: "Object Detection", "Image Segmentation"

versions: list[str] | None

If specified, filter to runs with any of the given Versions.

user_ids: list[str] | None

If specified, filter to runs with any of the given User IDs.

Returns

list[Run]

Runs matching the filter criteria

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

chariot.training_v2.run.validate_run_config(*, blueprint_id: str, config: dict)[source]

Validate a training run configuration against the provided blueprint id.

Parameters

blueprint_id: str

The blueprint to validate against

config: dict

The run configuration to validate

Raises

ValidationError

if the provided run config is invalid

APIException

if api communication fails, request is unauthorized or is unauthenticated.

Module contents