chariot.datasets package

Submodules

chariot.datasets.annotations module

chariot.datasets.annotations.archive_annotation(id: str, *, task_id: str | None = None) Annotation[source]

Archive (soft-delete) an annotation by id

Parameters:
  • id (str) – Id of annotation to archive

  • task_id (Optional[str]) – Id of task to which the datum is locked with

Returns:

Annotation details

Return type:

models.Annotation

chariot.datasets.annotations.create_annotation(datum_id: str, *, class_label: str | None = None, contour: list[list[Point]] | None = None, bbox: BoundingBox | None = None, oriented_bbox: OrientedBoundingBox | None = None, text_classification: TextClassification | None = None, text_generation: TextGeneration | None = None, token_classification: TokenClassification | None = None, metadata: dict[str, Any] | None = None, approval_status: ApprovalStatus | None = None, task_id: str | None = None) Annotation[source]

Create a new annotation

Parameters:
  • datum_id (str) – Id of datum to add annotation to

  • class_label (Optional[str]) – Class label of the annotation

  • contour (Optional[List[List[models.Point]]]) – Contour for an Image Segmentation annotation

  • bbox (Optional[models.BoundingBox]) – Bounding box for an Object Detection annotation

  • oriented_bbox (Optional[models.OrientedBoundingBox]) – Oriented bounding box for an Oriented Object Detection annotation

  • text_classification (Optional[models.TextClassification]) – Text Classification annotation

  • text_generation (Optional[models.TextGeneration]) – Text Generation annotation

  • token_classification (Optional[models.TokenClassification]) – Token Classification annotation

  • metadata (Optional[Dict[str, Any]]) – Metadata associated with the annotation

  • approval_status (Optional[models.ApprovalStatus]) – Reviewer approval status for the annotation

  • task_id (Optional[str]) – Id of task to which the datum is locked with

Returns:

New annotation details

Return type:

models.Annotation

chariot.datasets.annotations.get_annotation(id: str) Annotation[source]

Get an annotation by id

Parameters:

id (str) – Id of annotation to get

Returns:

Annotation details

Return type:

models.Annotation

chariot.datasets.annotations.update_annotation(annotation_id: str, *, class_label: str | None = None, contour: list[list[Point]] | None = None, bbox: BoundingBox | None = None, oriented_bbox: OrientedBoundingBox | None = None, text_classification: TextClassification | None = None, text_generation: TextGeneration | None = None, token_classification: TokenClassification | None = None, metadata: dict[str, Any] | None = None, approval_status: ApprovalStatus | None = None, updated_at: str | None = None, task_id: str | None = None) Annotation[source]

Update or replace an annotation

Parameters:
  • annotation_id (str) – Id of annotation to be updted or replaced

  • class_label (Optional[str]) – Class label of the annotation

  • contour (Optional[List[List[models.Point]]]) – Contour for an Image Segmentation annotation

  • bbox (Optional[models.BoundingBox]) – Bounding box for an Object Detection annotation

  • oriented_bbox (Optional[models.OrientedBoundingBox]) – Oriented bounding box for an Oriented Object Detection annotation

  • text_classification (Optional[models.TextClassification]) – Text Classification annotation

  • text_generation (Optional[models.TextGeneration]) – Text Generation annotation

  • token_classification (Optional[models.TokenClassification]) – Token Classification annotation

  • metadata (Optional[Dict[str, Any]]) – Metadata associated with the annotation

  • approval_status (Optional[models.ApprovalStatus]) – Reviewer approval status for the annotation

  • updated_at (Optional[str]) – must match the updated_at time on the annotation being updated

  • task_id (Optional[str]) – Id of task to which the datum is locked with

Returns:

New annotation details

Return type:

models.Annotation

chariot.datasets.datasets module

chariot.datasets.datasets.create_dataset(*, name: str, type: DatasetType, project_id: str, description: str | None = None, is_public: bool | None = None, _is_test: bool | None = None) Dataset[source]

Create a new, empty dataset

Parameters:
  • name (str) – Dataset name

  • type (models.DatasetType) – Dataset type

  • project_id (str) – Project id to create the dataset in

  • description (Optional[str]) – Dataset description

  • is_public (Optional[bool]) – When set to true, the dataset will be publically accessible.

Returns:

New dataset details

Return type:

models.Dataset

chariot.datasets.datasets.create_dataset_timeline_description(id: str, description: str, timestamp: datetime)[source]

Adds a user-defined description event for a particular timeline event group.

Parameters

idstr

Id of the dataset.

descriptionstr

Description of the timeline event group. Must be less than 200 characters.

timestampdatetime

Timestamp representing the event time of the group leader to which this description will be added

Raises

APIException

If api communication fails, request is unauthorized or is unauthenticated.

chariot.datasets.datasets.delete_dataset(id: str) Dataset[source]

Delete a dataset by id. The artifacts for the dataset will be deleted

Parameters:

id (str) – Id of dataset to delete

Returns:

Deleted dataset details

Return type:

models.Dataset

chariot.datasets.datasets.get_authorized_dataset_ids(ids: list[str]) list[str][source]

Given a list of Dataset Ids, return ids from the list that the user has read access to

Parameters:

ids (List[str]) – List of dataset ids to check

chariot.datasets.datasets.get_dataset(id: str) Dataset[source]

Get a dataset by id

Parameters:

id (str) – Dataset id

Returns:

Dataset details

Return type:

models.Dataset

chariot.datasets.datasets.get_dataset_statistics(*, exact_name_match: bool | None = None, exclude_unlabeled: bool | None = None, limit_to_write_access: bool | None = None, name: str | None = None, project_ids: list[str] | None = None, dataset_ids: list[str] | None = None, task_type_label_filters: list[TaskTypeLabelFilter] | None = None, type: DatasetType | None = None) DatasetStatistics[source]

Get dataset statistics with various criteria.

Parameters:
  • exact_name_match (Optional[bool]) – Require name filter to match exactly (defaults to false)

  • exclude_unlabeled (Optional[bool]) – Should unlabeled datasets be included (defaults to false)

  • limit_to_write_access (Optional[bool]) – Should the results only include datasets that the user has write access to (defaults to false)

  • name (Optional[str]) – Filter by dataset name

  • project_ids (Optional[List[str]] :param dataset_ids: Filter by dataset ids) – Filter by project ids

  • task_type_label_filters (Optional[List[models.TaskTypeLabelFilter]]) – Filter by task types and associated labels

  • type (Optional[models.DatasetType]) – Filter by dataset type

Returns:

Statistics for the datasets

Return type:

models.DatasetStatistics

chariot.datasets.datasets.get_dataset_timeline(id: str, *, max_items: int | None = None, direction: SortDirection | None = None, min_groups: int | None = None, max_ungrouped_events: int | None = None) Iterator[DatasetTimelineEvent][source]

Get a series of dataset change events ordered by time and grouped by event type.

Parameters:
  • id (str) – Dataset id to get events for

  • max_items (Optional[int]) – Limit the returned generator to only produce this many items

  • direction (Optional[models.SortDirection]) – Whether to sort in ascending or descending order

  • min_groups (Optional[int]) – How many groups are required before grouping behavior is turned on

  • max_ungrouped_events (Optional[int]) – The maximum number of events allowed before grouping behavior is turned on

Returns:

Events for the dataset

Return type:

Iterator[models.DatasetTimelineEvent]

chariot.datasets.datasets.get_datasets(*, exact_name_match: bool | None = None, exclude_unlabeled: bool | None = None, limit_to_write_access: bool | None = None, name: str | None = None, project_ids: list[str] | None = None, dataset_ids: list[str] | None = None, task_type_label_filters: list[TaskTypeLabelFilter] | None = None, type: DatasetType | None = None, sort: DatasetSortColumn | None = None, direction: SortDirection | None = None, max_items: int | None = None) Generator[Dataset, None, None][source]

Get datasets with various criteria. Returns a generator over all matching datasets.

Parameters:
  • exact_name_match (Optional[bool]) – Require name filter to match exactly (defaults to false)

  • exclude_unlabeled (Optional[bool]) – Should unlabeled datasets be included (defaults to false)

  • limit_to_write_access (Optional[bool]) – Should the results only include datasets that the user has write access to (defaults to false)

  • name (Optional[str]) – Filter by dataset name

  • project_ids (Optional[List[str]]) – Filter by project ids

  • dataset_ids (Optional[List[str]]) – Filter by dataset ids

  • task_type_label_filters (Optional[List[models.TaskTypeLabelFilter]]) – Filter by task types and associated labels

  • type (Optional[models.DatasetType]) – Filter by dataset type

  • sort (Optional[models.DatasetSortColumn]) – How to sort the returned datasets

  • direction (Optional[models.SortDirection]) – Whether to sort in ascending or descending order

  • max_items (Optional[int]) – Limit the returned generator to only produce this many items

Returns:

Dataset details for datasets matching the criteria

Return type:

Generator[models.Dataset, None, None]

chariot.datasets.datasets.update_dataset(id: str, *, name: str | None = None, description: str | None = None) Dataset[source]

Update a dataset’s name or description

Parameters:
  • id (str) – Dataset id to update

  • name (Optional[str]) – New name for the dataset. Name remains unmodified if set to None

  • description (Optional[str]) – New description for the dataset. Description remains unmodified if set to None

Returns:

Updated dataset details

Return type:

models.Dataset

chariot.datasets.datums module

chariot.datasets.datums.archive_datum(id: str) Datum[source]

Archive (soft-delete) a datum by id

Parameters:

id (str) – Id of datum to archive

Returns:

Datum details

Return type:

models.Datum

chariot.datasets.datums.get_dataset_datum_count(dataset_id: str, *, task_type_label_filters: list[TaskTypeLabelFilter] | None = None, gps_coordinates_circle: Circle | None = None, gps_coordinates_rectangle: Rectangle | None = None, gps_coordinates_polygon: list[GeoPoint] | None = None, capture_timestamp_range: TimestampRange | None = None, metadata: dict[str, str] | None = None, asof_timestamp: datetime | None = None, unannotated: bool | None = None, datum_ids: list[str] | None = None, approval_status: list[str] | None = None, annotation_metadata: dict[str, str] | None = None) int[source]

Get dataset datum count with various criteria.

Parameters:
  • dataset_id (str) – Id of dataset to get datums for

  • task_type_label_filters (Optional[List[models.TaskTypeLabelFilter]]) – Filter by task types and associated labels

  • gps_coordinates_circle (Optional[models.Circle]) – Filter datums within the given circle

  • gps_coordinates_rectangle (Optional[models.Rectangle]) – Filter datums within the given rectangle

  • gps_coordinates_polygon (Optional[List[models.GeoPoint]]) – Filter datums within the given polygon

  • capture_timestamp_range (Optional[models.TimestampRange]) – Filter by datum capture timestamp

  • metadata (Optional[Dict[str, str]]) – Filter by datum metadata values

  • asof_timestamp (Optional[datetime]) – Filter datums and/or annotations at the timestamp

  • unannotated (Optional[bool]) – Filter datums without annotation

  • datum_ids (Optional[List[str]]) – Filter datums with a list of datum ids

  • approval_status (Optional[List[str]]) – Filter by annotation approval status

  • annotation_metadata (Optional[Dict[str, str]]) – Filter by annotation metadata values

Returns:

Datum count for matching datums

Return type:

int

chariot.datasets.datums.get_dataset_datum_labels(dataset_id: str, *, task_type_label_filter: TaskTypeLabelFilter | None = None, gps_coordinates_circle: Circle | None = None, gps_coordinates_rectangle: Rectangle | None = None, gps_coordinates_polygon: list[GeoPoint] | None = None, capture_timestamp_range: TimestampRange | None = None, metadata: dict[str, str] | None = None, asof_timestamp: datetime | None = None, datum_ids: list[str] | None = None, approval_status: list[str] | None = None, annotation_metadata: dict[str, str] | None = None, max_items: int | None = None) Generator[str, None, None][source]

Get dataset datum labels with various criteria

Parameters:
  • dataset_id (str) – Id of dataset to get datums for

  • task_type_label_filter (Optional[models.TaskTypeLabelFilter]) – Filter by a single task type and associated labels

  • gps_coordinates_circle (Optional[models.Circle]) – Filter datums within the given circle

  • gps_coordinates_rectangle (Optional[models.Rectangle]) – Filter datums within the given rectangle

  • gps_coordinates_polygon (Optional[List[models.GeoPoint]]) – Filter datums within the given polygon

  • capture_timestamp_range (Optional[models.TimestampRange]) – Filter by datum capture timestamp

  • metadata (Optional[Dict[str, str]]) – Filter by datum metadata values

  • asof_timestamp (Optional[datetime]) – Filter datums and/or annotations at the timestamp

  • datum_ids (Optional[List[str]]) – Filter datums with a list of datum ids

  • approval_status (Optional[List[str]]) – Filter by annotation approval status

  • annotation_metadata (Optional[Dict[str, str]]) – Filter by annotation metadata values

  • max_items (Optional[int]) – Limit the returned generator to only produce this many items

Returns:

Generator over the matching datum labels

Return type:

Generator[str, None, None]

chariot.datasets.datums.get_dataset_datum_statistics(dataset_id: str, *, task_type_label_filters: list[TaskTypeLabelFilter] | None = None, gps_coordinates_circle: Circle | None = None, gps_coordinates_rectangle: Rectangle | None = None, gps_coordinates_polygon: list[GeoPoint] | None = None, capture_timestamp_range: TimestampRange | None = None, metadata: dict[str, str] | None = None, asof_timestamp: datetime | None = None, unannotated: bool | None = None, datum_ids: list[str] | None = None, approval_status: list[str] | None = None, annotation_metadata: dict[str, str] | None = None) DatumStatistics[source]

Get dataset datum statistics with various criteria

Parameters:
  • dataset_id (str) – Id of dataset to get datums for

  • task_type_label_filters (Optional[List[models.TaskTypeLabelFilter]]) – Filter by task types and associated labels

  • gps_coordinates_circle (Optional[models.Circle]) – Filter datums within the given circle

  • gps_coordinates_rectangle (Optional[models.Rectangle]) – Filter datums within the given rectangle

  • gps_coordinates_polygon (Optional[List[models.GeoPoint]]) – Filter datums within the given polygon

  • capture_timestamp_range (Optional[models.TimestampRange]) – Filter by datum capture timestamp

  • metadata (Optional[Dict[str, str]]) – Filter by datum metadata values

  • asof_timestamp (Optional[datetime]) – Filter datums and/or annotations at the timestamp

  • unannotated (Optional[bool]) – Filter datums without annotation

  • datum_ids (Optional[List[str]]) – Filter datums with a list of datum ids

  • approval_status (Optional[List[str]]) – Filter by annotation approval status

  • annotation_metadata (Optional[Dict[str, str]]) – Filter by annotation metadata values

Returns:

Datum statistics for matching datums

Return type:

models.DatumStatistics

chariot.datasets.datums.get_dataset_datums(dataset_id: str, *, task_type_label_filters: list[TaskTypeLabelFilter] | None = None, gps_coordinates_circle: Circle | None = None, gps_coordinates_rectangle: Rectangle | None = None, gps_coordinates_polygon: list[GeoPoint] | None = None, capture_timestamp_range: TimestampRange | None = None, metadata: dict[str, str] | None = None, asof_timestamp: datetime | None = None, unannotated: bool | None = None, datum_ids: list[str] | None = None, approval_status: list[str] | None = None, annotation_metadata: dict[str, str] | None = None, max_items: int | None = None) Generator[Datum, None, None][source]

Get dataset datums with various criteria

Parameters:
  • dataset_id (str) – Id of dataset to get datums for

  • task_type_label_filters (Optional[List[models.TaskTypeLabelFilter]]) – Filter by task types and associated labels

  • gps_coordinates_circle (Optional[models.Circle]) – Filter datums within the given circle

  • gps_coordinates_rectangle (Optional[models.Rectangle]) – Filter datums within the given rectangle

  • gps_coordinates_polygon (Optional[List[models.GeoPoint]]) – Filter datums within the given polygon

  • capture_timestamp_range (Optional[models.TimestampRange]) – Filter by datum capture timestamp

  • metadata (Optional[Dict[str, str]]) – Filter by datum metadata values

  • asof_timestamp (Optional[datetime]) – Filter datums and/or annotations at the timestamp

  • unannotated (Optional[bool]) – Filter datums without annotation

  • datum_ids (Optional[List[str]]) – Filter datums with a list of datum ids

  • approval_status (Optional[List[str]]) – Filter by annotation approval status

  • annotation_metadata (Optional[Dict[str, str]]) – Filter by annotation metadata values

  • max_items (Optional[int]) – Limit the returned generator to only produce this many items

Returns:

Generator over the matching datums

Return type:

Generator[models.Datum, None, None]

chariot.datasets.datums.get_datum(id: str, *, task_type: TaskType | None = None) Datum[source]

Get a datum by id

Parameters:
  • id (str) – Id of datum to get

  • task_type (Optional[models.TaskType]) – Task type annotation filter

Returns:

Datum details

Return type:

models.Datum

chariot.datasets.datums.get_snapshot_datum_labels(snapshot_id: str, *, task_type_label_filter: TaskTypeLabelFilter | None = None, gps_coordinates_circle: Circle | None = None, gps_coordinates_rectangle: Rectangle | None = None, gps_coordinates_polygon: list[GeoPoint] | None = None, capture_timestamp_range: TimestampRange | None = None, metadata: dict[str, str] | None = None, split: str | None = None, datum_ids: list[str] | None = None, approval_status: list[str] | None = None, annotation_metadata: dict[str, str] | None = None, max_items: int | None = None) Generator[str, None, None][source]

Get snapshot datum labels with various criteria

Parameters:
  • snapshot_id (str) – Id of snapshot to get datums for

  • task_type_label_filter (Optional[models.TaskTypeLabelFilter]) – Filter by a single task type and associated labels

  • gps_coordinates_circle (Optional[models.Circle]) – Filter datums within the given circle

  • gps_coordinates_rectangle (Optional[models.Rectangle]) – Filter datums within the given rectangle

  • gps_coordinates_polygon (Optional[List[models.GeoPoint]]) – Filter datums within the given polygon

  • capture_timestamp_range (Optional[models.TimestampRange]) – Filter by datum capture timestamp

  • metadata (Optional[Dict[str, str]]) – Filter by datum metadata values

  • split (Optional[str]) – Filter by datum split assignment

  • datum_ids (Optional[List[str]]) – Filter datums with a list of datum ids

  • approval_status (Optional[List[str]]) – Filter by annotation approval status

  • annotation_metadata (Optional[Dict[str, str]]) – Filter by annotation metadata values

  • max_items (Optional[int]) – Limit the returned generator to only produce this many items

Returns:

Generator over the matching datum labels

Return type:

Generator[str, None, None]

chariot.datasets.datums.get_snapshot_datum_statistics(snapshot_id: str, *, task_type_label_filters: list[TaskTypeLabelFilter] | None = None, gps_coordinates_circle: Circle | None = None, gps_coordinates_rectangle: Rectangle | None = None, gps_coordinates_polygon: list[GeoPoint] | None = None, capture_timestamp_range: TimestampRange | None = None, metadata: dict[str, str] | None = None, split: str | None = None, datum_ids: list[str] | None = None, approval_status: list[str] | None = None, annotation_metadata: dict[str, str] | None = None) DatumStatistics[source]

Get snapshot datum statistics with various criteria

Parameters:
  • snapshot_id (str) – Id of snapshot to get datums for

  • task_type_label_filters (Optional[List[models.TaskTypeLabelFilter]]) – Filter by task types and associated labels

  • gps_coordinates_circle (Optional[models.Circle]) – Filter datums within the given circle

  • gps_coordinates_rectangle (Optional[models.Rectangle]) – Filter datums within the given rectangle

  • gps_coordinates_polygon (Optional[List[models.GeoPoint]]) – Filter datums within the given polygon

  • capture_timestamp_range (Optional[models.TimestampRange]) – Filter by datum capture timestamp

  • metadata (Optional[Dict[str, str]]) – Filter by datum metadata values

  • split (Optional[str]) – Filter by datum split assignment

  • datum_ids (Optional[List[str]]) – Filter datums with a list of datum ids

  • approval_status (Optional[List[str]]) – Filter by annotation approval status

  • annotation_metadata (Optional[Dict[str, str]]) – Filter by annotation metadata values

Returns:

Datum statistics for matching datums

Return type:

models.DatumStatistics

chariot.datasets.datums.get_snapshot_datums(snapshot_id: str, *, task_type_label_filters: list[TaskTypeLabelFilter] | None = None, gps_coordinates_circle: Circle | None = None, gps_coordinates_rectangle: Rectangle | None = None, gps_coordinates_polygon: list[GeoPoint] | None = None, capture_timestamp_range: TimestampRange | None = None, metadata: dict[str, str] | None = None, split: str | None = None, datum_ids: list[str] | None = None, approval_status: list[str] | None = None, annotation_metadata: dict[str, str] | None = None, max_items: int | None = None) Generator[Datum, None, None][source]

Get snapshot datums with various criteria

Parameters:
  • snapshot_id (str) – Id of snapshot to get datums for

  • task_type_label_filters (Optional[List[models.TaskTypeLabelFilter]]) – Filter by task types and associated labels

  • gps_coordinates_circle (Optional[models.Circle]) – Filter datums within the given circle

  • gps_coordinates_rectangle (Optional[models.Rectangle]) – Filter datums within the given rectangle

  • gps_coordinates_polygon (Optional[List[models.GeoPoint]]) – Filter datums within the given polygon

  • capture_timestamp_range (Optional[models.TimestampRange]) – Filter by datum capture timestamp

  • metadata (Optional[Dict[str, str]]) – Filter by datum metadata values

  • split (Optional[str]) – Filter by datum split assignment

  • datum_ids (Optional[List[str]]) – Filter datums with a list of datum ids

  • approval_status (Optional[List[str]]) – Filter by annotation approval status

  • annotation_metadata (Optional[Dict[str, str]]) – Filter by annotation metadata values

  • max_items (Optional[int]) – Limit the returned generator to only produce this many items

Returns:

Generator over the matching datums

Return type:

Generator[models.Datum, None, None]

chariot.datasets.datums.get_upload_datums(upload_id: str, *, task_type_label_filters: list[TaskTypeLabelFilter] | None = None, gps_coordinates_circle: Circle | None = None, gps_coordinates_rectangle: Rectangle | None = None, gps_coordinates_polygon: list[GeoPoint] | None = None, capture_timestamp_range: TimestampRange | None = None, metadata: dict[str, str] | None = None, unannotated: bool | None = None, datum_ids: list[str] | None = None, approval_status: list[str] | None = None, annotation_metadata: dict[str, str] | None = None, max_items: int | None = None) Generator[Datum, None, None][source]

Get upload datums with various criteria

Parameters:
  • upload_id (str) – Id of upload to get datums for

  • task_type_label_filters (Optional[List[models.TaskTypeLabelFilter]]) – Filter by task types and associated labels

  • gps_coordinates_circle (Optional[models.Circle]) – Filter datums within the given circle

  • gps_coordinates_rectangle (Optional[models.Rectangle]) – Filter datums within the given rectangle

  • gps_coordinates_polygon (Optional[List[models.GeoPoint]]) – Filter datums within the given polygon

  • capture_timestamp_range (Optional[models.TimestampRange]) – Filter by datum capture timestamp

  • metadata (Optional[Dict[str, str]]) – Filter by datum metadata values

  • unannotated (Optional[bool]) – Filter datums without annotation

  • datum_ids (Optional[List[str]]) – Filter datums with a list of datum ids

  • approval_status (Optional[List[str]]) – Filter by annotation approval status

  • annotation_metadata (Optional[Dict[str, str]]) – Filter by annotation metadata values

  • max_items (Optional[int]) – Limit the returned generator to only produce this many items

Returns:

Generator over the matching datums

Return type:

Generator[models.Datum, None, None]

chariot.datasets.exceptions module

exception chariot.datasets.exceptions.UploadIncompleteError(upload_id: str, status: UploadStatus)[source]

Bases: Exception

status: UploadStatus
upload_id: str
exception chariot.datasets.exceptions.UploadUnknownError(upload_id: str)[source]

Bases: Exception

upload_id: str
exception chariot.datasets.exceptions.UploadValidationError(upload_id: str, validation_errors: list[str])[source]

Bases: Exception

upload_id: str
validation_errors: list[str]

chariot.datasets.files module

chariot.datasets.files.create_dataset_file(*, dataset_id: str, file_format: FileFormat | None = None, file_type: FileType, manifest_type: ManifestType | None = None, split: SplitName | None = None) File[source]

Create or retrieve an archive file or manifest file for a dataset, return the file object with location if available. The Function only starts the file creation process if the file does not already exist. Note: Creating archive files for datasets are not currently supported and will result in an error.

Parameters:
Returns:

File detail for the newly created or existent file

Return type:

models.File

chariot.datasets.files.create_dataset_file_and_wait(*, dataset_id: str, file_format: FileFormat | None = None, file_type: FileType, manifest_type: ManifestType | None = None, split: SplitName | None = None, timeout: float = 120, wait_interval: float = 0.5) File[source]

Create or retrieve an archive file or manifest file for a dataset. Returns the file object with location. The function polls the API until the presigned url for the dataset file is populated or the timeout is reached. Note: Creating archive files for datasets are not currently supported and will result in an error.

Parameters:
  • dataset_id (str) – Id of dataset to create file for

  • file_format (Optional[models.FileFormat]) – File format

  • file_type (models.FileType) – File type

  • manifest_type (Optional[models.ManifestType]) – Manifest type

  • split (Optional[models.SplitName]) – Split

  • timeout (float) – Number of seconds to wait for file completion (default 120 second)

  • wait_interval (float) – Number of seconds between successive calls to check the file presigned url (default 0.5)

Returns:

File detail for the newly created or existent file

Return type:

models.File

Raises:

TimeoutError – If the timeout has been reached

chariot.datasets.files.create_snapshot_file(*, snapshot_id: str, file_format: FileFormat | None = None, file_type: FileType, manifest_type: ManifestType | None = None, split: SplitName | None = None) File[source]

Create or retrieve an archive file or manifest file for a snapshot, return the file object with location if available. The Function only starts the file creation process if the file does not already exist.

Parameters:
Returns:

File detail for the newly created or existent file

Return type:

models.File

chariot.datasets.files.create_snapshot_file_and_wait(*, snapshot_id: str, file_format: FileFormat | None = None, file_type: FileType, manifest_type: ManifestType | None = None, split: SplitName | None = None, timeout: float = 120, wait_interval: float = 0.5) File[source]

Create or retrieve an archive file or manifest file for a snapshot. Returns the file object with location. The function polls the API until the presigned url for the snapshot file is populated or the timeout is reached.

Parameters:
  • snapshot_id – Id of snapshot to create file for

  • file_format (Optional[models.FileFormat]) – File format

  • file_type (models.FileType) – File type

  • manifest_type (Optional[models.ManifestType]) – Manifest type

  • split (Optional[models.SplitName]) – Split

  • timeout (float) – Number of seconds to wait for file completion (default 120 second)

  • wait_interval (float) – Number of seconds between successive calls to check the file presigned url (default 0.5)

Returns:

File detail for the newly created or existent file

Return type:

models.File

Raises:

TimeoutError – If the timeout has been reached

chariot.datasets.files.get_dataset_files(dataset_id: str) list[File][source]

Get files for a dataset

Parameters:

dataset_id (str) – Dataset ID to retrieve files for.

Returns:

File details for the dataset ID

Return type:

List[models.File]

chariot.datasets.files.get_file(id: str) File[source]
chariot.datasets.files.get_snapshot_files(snapshot_id: str) list[File][source]

Get files for a snapshot

Parameters:

snapshot_id (str) – Snapshot ID to retrieve files for.

Returns:

File details for the snapshot ID

Return type:

List[models.File]

chariot.datasets.files.wait_for_file(id: str, *, timeout: float = 120, wait_interval: float = 0.5) File[source]

Polls the given file until it has finished processing.

Parameters:
  • id (str) – Id of the file to wait for

  • timeout (float) – Number of seconds to wait for file to complete (default 120)

  • wait_interval (float) – Number of seconds between successive calls to check the file for completion (default 0.5)

Returns:

The file details

Return type:

models.File

Raises:

TimeoutError – If the timeout has been reached

chariot.datasets.models module

Datasets models.

class chariot.datasets.models.Annotation(id: str, datum_id: str | None, upload_id: str | None, task_type: chariot.datasets.models.TaskType, class_label: str | None, contour: list[list[chariot.datasets.models.Point]] | None, bbox: chariot.datasets.models.BoundingBox | None, oriented_bbox: chariot.datasets.models.OrientedBoundingBox | None, text_classification: chariot.datasets.models.TextClassification | None, token_classification: chariot.datasets.models.TokenClassification | None, text_generation: chariot.datasets.models.TextGeneration | None, created_at: datetime.datetime, updated_at: datetime.datetime, archived_at: datetime.datetime | None, archived_upload_id: str | None, size: int | None, approval_status: str, metadata: dict[str, Any] | None = None, previous_annotation_id: str | None = None, datum_annotation_updated_at: str | None = None, prev_datum_annotation_updated_at: str | None = None)[source]

Bases: Base

approval_status: str
archived_at: datetime | None
archived_upload_id: str | None
bbox: BoundingBox | None
class_label: str | None
contour: list[list[Point]] | None
created_at: datetime
datum_annotation_updated_at: str | None = None
datum_id: str | None
id: str
metadata: dict[str, Any] | None = None
oriented_bbox: OrientedBoundingBox | None
prev_datum_annotation_updated_at: str | None = None
previous_annotation_id: str | None = None
size: int | None
task_type: TaskType
text_classification: TextClassification | None
text_generation: TextGeneration | None
token_classification: TokenClassification | None
updated_at: datetime
upload_id: str | None
class chariot.datasets.models.ApprovalStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

NEEDS_REVIEW = 'needs_review'
NOT_REVIEWED = ''
REJECTED = 'rejected'
VERIFIED = 'verified'
class chariot.datasets.models.BoundingBox(xmin: float, xmax: float, ymin: float, ymax: float)[source]

Bases: Base

xmax: float
xmin: float
ymax: float
ymin: float
class chariot.datasets.models.Circle(center: chariot.datasets.models.GeoPoint, radius: float)[source]

Bases: Base

center: GeoPoint
radius: float
class chariot.datasets.models.ContextLabelFilter(context: str | None = None, labels: list[str] | None = None)[source]

Bases: Base

context: str | None = None
labels: list[str] | None = None
class chariot.datasets.models.Dataset(id: str, name: str, type: chariot.datasets.models.DatasetType, project_id: str, is_public: bool, is_test: bool, delete_lock: bool, created_at: datetime.datetime, updated_at: datetime.datetime, description: str | None = None, archived_at: datetime.datetime | None = None, archived_by: str | None = None, summary: chariot.datasets.models.DatasetSummary | None = None, migration_status: chariot.datasets.models.MigrationStatus | None = None)[source]

Bases: Base

archived_at: datetime | None = None
archived_by: str | None = None
created_at: datetime
delete_lock: bool
description: str | None = None
id: str
is_public: bool
is_test: bool
migration_status: MigrationStatus | None = None
name: str
project_id: str
summary: DatasetSummary | None = None
type: DatasetType
updated_at: datetime
class chariot.datasets.models.DatasetConfig(dataset_ids: list[str] | None = None, dataset_names: list[str] | None = None, exact_name_match: bool | None = None, limit_to_write_access: bool | None = None, dataset_type: str | None = None)[source]

Bases: Base

dataset_ids: list[str] | None = None
dataset_names: list[str] | None = None
dataset_type: str | None = None
exact_name_match: bool | None = None
limit_to_write_access: bool | None = None
class chariot.datasets.models.DatasetSortColumn(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

CREATION_TIMESTAMP = 'creation timestamp'
DATUM_COUNT = 'datum count'
NAME = 'name'
UPDATED_TIMESTAMP = 'updated timestamp'
class chariot.datasets.models.DatasetStatistics(datum_count: int, available_datum_count: int, new_datum_count: int, annotation_count: int, class_label_count: int, bounding_box_count: int, oriented_bounding_box_count: int, contour_count: int, text_classification_count: int, token_classification_count: int, text_generation_count: int, class_label_distribution: dict[str, int] | None, text_classification_distribution: list[chariot.datasets.models.Distribution] | None, token_classification_distribution: dict[str, int] | None, text_generation_distribution: dict[str, int] | None, annotation_count_by_approval_status: dict[str, int] | None, dataset_count: int, total_datum_size: int, largest_datum_size: int, unannotated_datum_count: int)[source]

Bases: DatumStatistics

dataset_count: int
largest_datum_size: int
total_datum_size: int
unannotated_datum_count: int
class chariot.datasets.models.DatasetSummary(datum_count: int, available_datum_count: int, new_datum_count: int, annotation_count: int, class_label_count: int, bounding_box_count: int, oriented_bounding_box_count: int, contour_count: int, text_classification_count: int, token_classification_count: int, text_generation_count: int, class_label_distribution: dict[str, int] | None, text_classification_distribution: list[chariot.datasets.models.Distribution] | None, token_classification_distribution: dict[str, int] | None, text_generation_distribution: dict[str, int] | None, annotation_count_by_approval_status: dict[str, int] | None, total_datum_size: int, largest_datum_size: int, unannotated_datum_count: int)[source]

Bases: DatumStatistics

largest_datum_size: int
total_datum_size: int
unannotated_datum_count: int
class chariot.datasets.models.DatasetTimelineEvent(event_timestamp: datetime.datetime, dataset_id: str, event_associated_record_id: str | None, event_operation: str | None, event_user_id: str | None, datums_created: int | None, datums_deleted: str | None, datums_modified: str | None, annotations_created: str | None, annotations_deleted: str | None, annotations_modified: str | None, snapshots: list[chariot.datasets.models.Snapshot] | None, event_group_num_timestamps: int, event_group_num_users: int, event_group_start_timestamp: datetime.datetime, event_group_description: str, event_associated_task_id: str | None)[source]

Bases: Base

annotations_created: str | None
annotations_deleted: str | None
annotations_modified: str | None
dataset_id: str
datums_created: int | None
datums_deleted: str | None
datums_modified: str | None
event_associated_record_id: str | None
event_associated_task_id: str | None
event_group_description: str
event_group_num_timestamps: int
event_group_num_users: int
event_group_start_timestamp: datetime
event_operation: str | None
event_timestamp: datetime
event_user_id: str | None
snapshots: list[Snapshot] | None
class chariot.datasets.models.DatasetType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

IMAGE = 'image'
TEXT = 'text'
class chariot.datasets.models.Datum(id: str, coordinates: chariot.datasets.models.GeoPoint | None, timestamp: datetime.datetime | None, metadata: dict[str, Any] | None, created_at: datetime.datetime, archived_at: datetime.datetime | None, dataset: chariot.datasets.models.Dataset | None, annotations: list[chariot.datasets.models.Annotation] | None, presigned_url: str, signature: str, size: int, split: chariot.datasets.models.SplitName | None, datum_annotation_updated_at: str | None = None, task_lock_details: chariot.datasets.models.DatumTaskActivity | None = None, preview_presigned_urls: list[str] | None = None)[source]

Bases: Base

annotations: list[Annotation] | None
archived_at: datetime | None
coordinates: GeoPoint | None
created_at: datetime
dataset: Dataset | None
datum_annotation_updated_at: str | None = None
id: str
metadata: dict[str, Any] | None
presigned_url: str
preview_presigned_urls: list[str] | None = None
signature: str
size: int
split: SplitName | None
task_lock_details: DatumTaskActivity | None = None
timestamp: datetime | None
class chariot.datasets.models.DatumConfig(task_type_label_filters: list[chariot.datasets.models.TaskTypeLabelFilter] | None, gps_coordinates_circle: chariot.datasets.models.Circle | None, gps_coordinates_rectangle: chariot.datasets.models.Rectangle | None, gps_coordinates_polygon: list[chariot.datasets.models.GeoPoint] | None, capture_timestamp_range: chariot.datasets.models.TimestampRange | None, metadata: dict[str, str] | None, asof_timestamp: datetime.datetime | None, unannotated: bool | None, datum_ids: list[str] | None, approval_status: list[chariot.datasets.models.ApprovalStatus] | None, annotation_metadata: dict[str, str] | None)[source]

Bases: DatumFilter

classmethod from_partial(**kwargs)[source]
class chariot.datasets.models.DatumFilter(task_type_label_filters: list[chariot.datasets.models.TaskTypeLabelFilter] | None, gps_coordinates_circle: chariot.datasets.models.Circle | None, gps_coordinates_rectangle: chariot.datasets.models.Rectangle | None, gps_coordinates_polygon: list[chariot.datasets.models.GeoPoint] | None, capture_timestamp_range: chariot.datasets.models.TimestampRange | None, metadata: dict[str, str] | None, asof_timestamp: datetime.datetime | None, unannotated: bool | None, datum_ids: list[str] | None, approval_status: list[chariot.datasets.models.ApprovalStatus] | None, annotation_metadata: dict[str, str] | None)[source]

Bases: Base

annotation_metadata: dict[str, str] | None
approval_status: list[ApprovalStatus] | None
asof_timestamp: datetime | None
capture_timestamp_range: TimestampRange | None
datum_ids: list[str] | None
gps_coordinates_circle: Circle | None
gps_coordinates_polygon: list[GeoPoint] | None
gps_coordinates_rectangle: Rectangle | None
metadata: dict[str, str] | None
task_type_label_filters: list[TaskTypeLabelFilter] | None
unannotated: bool | None
class chariot.datasets.models.DatumSortColumn(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

CREATION_TIMESTAMP = 'creation timestamp'
class chariot.datasets.models.DatumStatistics(datum_count: int, available_datum_count: int, new_datum_count: int, annotation_count: int, class_label_count: int, bounding_box_count: int, oriented_bounding_box_count: int, contour_count: int, text_classification_count: int, token_classification_count: int, text_generation_count: int, class_label_distribution: dict[str, int] | None, text_classification_distribution: list[chariot.datasets.models.Distribution] | None, token_classification_distribution: dict[str, int] | None, text_generation_distribution: dict[str, int] | None, annotation_count_by_approval_status: dict[str, int] | None)[source]

Bases: Base

annotation_count: int
annotation_count_by_approval_status: dict[str, int] | None
available_datum_count: int
bounding_box_count: int
class_label_count: int
class_label_distribution: dict[str, int] | None
contour_count: int
datum_count: int
new_datum_count: int
oriented_bounding_box_count: int
text_classification_count: int
text_classification_distribution: list[Distribution] | None
text_generation_count: int
text_generation_distribution: dict[str, int] | None
token_classification_count: int
token_classification_distribution: dict[str, int] | None
class chariot.datasets.models.DatumTask(id: str, name: str, description: str | None, created_at: datetime.datetime, updated_at: datetime.datetime, archived_at: datetime.datetime | None, created_by: str, updated_by: str, archived_by: str | None, project_id: str, dataset_config: chariot.datasets.models.DatasetConfig | None, datum_config: chariot.datasets.models.DatumConfig | None)[source]

Bases: Base

archived_at: datetime | None
archived_by: str | None
created_at: datetime
created_by: str
dataset_config: DatasetConfig | None
datum_config: DatumConfig | None
description: str | None
id: str
name: str
project_id: str
updated_at: datetime
updated_by: str
class chariot.datasets.models.DatumTaskActivity(dataset_id: str | None, datum_id: str | None, task_id: str, user_id: str, activity: chariot.datasets.models.DatumTaskActivityCode | None, activity_start_time: datetime.datetime | None, activity_end_time: datetime.datetime | None)[source]

Bases: Base

activity: DatumTaskActivityCode | None
activity_end_time: datetime | None
activity_start_time: datetime | None
dataset_id: str | None
datum_id: str | None
task_id: str
user_id: str
class chariot.datasets.models.DatumTaskActivityCode(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

LOCKED = 'locked'
SKIPPED = 'skipped'
VIEWED = 'viewed'
class chariot.datasets.models.DatumTaskDetails(id: str, name: str, description: str | None, created_at: datetime.datetime, updated_at: datetime.datetime, archived_at: datetime.datetime | None, created_by: str, updated_by: str, archived_by: str | None, project_id: str, dataset_config: chariot.datasets.models.DatasetConfig | None, datum_config: chariot.datasets.models.DatumConfig | None, statistics: chariot.datasets.models.DatumTaskStatistics)[source]

Bases: DatumTask

statistics: DatumTaskStatistics
class chariot.datasets.models.DatumTaskStatistics(datum_count: int, available_datum_count: int, new_datum_count: int, annotation_count: int, class_label_count: int, bounding_box_count: int, oriented_bounding_box_count: int, contour_count: int, text_classification_count: int, token_classification_count: int, text_generation_count: int, class_label_distribution: dict[str, int] | None, text_classification_distribution: list[chariot.datasets.models.Distribution] | None, token_classification_distribution: dict[str, int] | None, text_generation_distribution: dict[str, int] | None, annotation_count_by_approval_status: dict[str, int] | None, task_count: int, dataset_count: int, user_count: int, datum_count_by_user_id: dict[str, int] | None, activity_count: int, datum_count_by_activity_status: dict[str, int] | None)[source]

Bases: DatumStatistics

activity_count: int
dataset_count: int
datum_count_by_activity_status: dict[str, int] | None
datum_count_by_user_id: dict[str, int] | None
task_count: int
user_count: int
class chariot.datasets.models.Distribution(context: str | None, distribution: dict[str, int])[source]

Bases: Base

context: str | None
distribution: dict[str, int]
class chariot.datasets.models.File(id: str, dataset: chariot.datasets.models.Dataset | None, dataset_timestamp: datetime.datetime | None, snapshot: chariot.datasets.models.Snapshot | None, split: chariot.datasets.models.SplitName | None, type: chariot.datasets.models.FileType, manifest_type: chariot.datasets.models.ManifestType | None, file_format: chariot.datasets.models.FileFormat, presigned_url: str | None, created_at: datetime.datetime, updated_at: datetime.datetime, archived_at: datetime.datetime | None, expires_at: datetime.datetime | None, job: chariot.datasets.models.Job | None, status: chariot.datasets.models.FileStatus | None)[source]

Bases: Base

archived_at: datetime | None
created_at: datetime
dataset: Dataset | None
dataset_timestamp: datetime | None
expires_at: datetime | None
file_format: FileFormat
id: str
job: Job | None
manifest_type: ManifestType | None
presigned_url: str | None
snapshot: Snapshot | None
split: SplitName | None
status: FileStatus | None
type: FileType
updated_at: datetime
class chariot.datasets.models.FileFormat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

GZ = 'gz'
TGZ = 'tgz'
ZIP = 'zip'
class chariot.datasets.models.FileStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

ARCHIVED = 'archived'
COMPLETE = 'complete'
ERROR = 'error'
PENDING = 'pending'
PROCESSING = 'processing'
class chariot.datasets.models.FileType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

ARCHIVE = 'archive'
MANIFEST = 'manifest'
class chariot.datasets.models.GeoPoint(latitude: float, longitude: float)[source]

Bases: Base

latitude: float
longitude: float
class chariot.datasets.models.Job(id: str, type: chariot.datasets.models.JobType, status: chariot.datasets.models.JobStatus, progress_message: str | None, dataset: chariot.datasets.models.Dataset | None, upload: Any | None, file: Any | None, view: chariot.datasets.models.View | None, execution_count: int, created_at: datetime.datetime, updated_at: datetime.datetime, start_after: datetime.datetime | None, schedule_cron: str | None)[source]

Bases: Base

created_at: datetime
dataset: Dataset | None
execution_count: int
file: Any | None
id: str
progress_message: str | None
schedule_cron: str | None
start_after: datetime | None
status: JobStatus
type: JobType
updated_at: datetime
upload: Any | None
view: View | None
class chariot.datasets.models.JobStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

IN_PROGRESS = 'in progress'
READY = 'ready'
class chariot.datasets.models.JobType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

DELETE_DATASET = 'delete_dataset'
DELETE_FILE = 'delete_file'
DELETE_UPLOAD = 'delete_upload'
FILE = 'file'
SNAPSHOT = 'snapshot'
UPLOAD = 'upload'
class chariot.datasets.models.ManifestType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

ALL = 'all'
ANNOTATED = 'annotated'
class chariot.datasets.models.MigrationStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

CLEANUP = 'cleanup'
COMPLETE = 'complete'
DOWNLOADING = 'downloading'
ERROR = 'error'
EXCEPTION = 'exception'
IDENTIFIED = 'identified'
PLANNED = 'planned'
UPLOADING_HORIZONTALS = 'uploading_horizontals'
UPLOADING_VERTICAL = 'uploading_vertical'
class chariot.datasets.models.OrientedBoundingBox(cx: float, cy: float, w: float, h: float, r: float)[source]

Bases: Base

cx: float
cy: float
h: float
r: float
w: float
class chariot.datasets.models.Point(x: float, y: float)[source]

Bases: Base

x: float
y: float
class chariot.datasets.models.PresignedUrl(method: str, url: str)[source]

Bases: Base

method: str
url: str
class chariot.datasets.models.Rectangle(p1: chariot.datasets.models.GeoPoint, p2: chariot.datasets.models.GeoPoint)[source]

Bases: Base

p1: GeoPoint
p2: GeoPoint
class chariot.datasets.models.Snapshot(id: str, view: chariot.datasets.models.View, name: str, timestamp: datetime.datetime, summary: chariot.datasets.models.DatasetSummary | None, split_summaries: dict[chariot.datasets.models.SplitName, chariot.datasets.models.DatasetSummary] | None, status: chariot.datasets.models.SnapshotStatus, created_at: datetime.datetime | None, updated_at: datetime.datetime | None)[source]

Bases: Base

created_at: datetime | None
id: str
name: str
split_summaries: dict[SplitName, DatasetSummary] | None
status: SnapshotStatus
summary: DatasetSummary | None
timestamp: datetime
updated_at: datetime | None
view: View
class chariot.datasets.models.SnapshotSortColumn(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

CREATION_TIMESTAMP = 'creation timestamp'
ID = 'id'
NAME = 'name'
TIMESTAMP = 'timestamp'
class chariot.datasets.models.SnapshotStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

COMPLETE = 'complete'
ERROR = 'error'
PENDING = 'pending'
PREVIEW = 'preview'
class chariot.datasets.models.SortDirection(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

ASCENDING = 'asc'
DESCENDING = 'desc'
class chariot.datasets.models.SplitAlgorithm(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

RANDOM = 'random'
class chariot.datasets.models.SplitConfig(sample_count: int | None, split_algorithm: chariot.datasets.models.SplitAlgorithm | None, apply_default_split: bool | None, splits: dict[chariot.datasets.models.SplitName, float] | None)[source]

Bases: Base

apply_default_split: bool | None
sample_count: int | None
split_algorithm: SplitAlgorithm | None
splits: dict[SplitName, float] | None
class chariot.datasets.models.SplitName(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

TEST = 'test'
TRAIN = 'train'
VAL = 'val'
class chariot.datasets.models.TaskActivitySortColumn(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

ACTIVITY_END_TIME = 'activity end timestamp'
ACTIVITY_START_TIME = 'activity start timestamp'
class chariot.datasets.models.TaskSortColumn(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

ID = 'id'
NAME = 'name'
class chariot.datasets.models.TaskType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

IMAGE_CLASSIFICATION = 'Image Classification'
IMAGE_SEGMENTATION = 'Image Segmentation'
OBJECT_DETECTION = 'Object Detection'
ORIENTED_OBJECT_DETECTION = 'Oriented Object Detection'
TEXT_CLASSIFICATION = 'Text Classification'
TEXT_GENERATION = 'Text Generation'
TOKEN_CLASSIFICATION = 'Token Classification'
class chariot.datasets.models.TaskTypeLabelFilter(task_type: chariot.datasets.models.TaskType, labels: list[str] | None = None, contexts: list[str | None] | None = None, context_labels: list[chariot.datasets.models.ContextLabelFilter] | None = None)[source]

Bases: Base

context_labels: list[ContextLabelFilter] | None = None
contexts: list[str | None] | None = None
labels: list[str] | None = None
task_type: TaskType
class chariot.datasets.models.TextClassification(context: str | None, label: str)[source]

Bases: Base

context: str | None
label: str
class chariot.datasets.models.TextGeneration(context: str | None, generated_text: str | None, generated_text_presigned_url: str | None)[source]

Bases: Base

context: str | None
generated_text: str | None
generated_text_presigned_url: str | None
class chariot.datasets.models.TimestampRange(start: datetime.datetime | None, end: datetime.datetime | None)[source]

Bases: Base

end: datetime | None
start: datetime | None
to_query_param() str[source]
class chariot.datasets.models.TokenClassification(label: str, start: int, end: int)[source]

Bases: Base

end: int
label: str
start: int
class chariot.datasets.models.Upload(id: str, job: chariot.datasets.models.Job | None, type: chariot.datasets.models.UploadType, is_gzipped: bool | None, split: chariot.datasets.models.SplitName | None, status: chariot.datasets.models.UploadStatus, name: str | None, size: int | None, delete_source: bool, max_validation_errors: int, image_validation: bool, validation_errors: list[str] | None, created_at: datetime.datetime, updated_at: datetime.datetime, data_created_at: datetime.datetime | None, presigned_urls: list[chariot.datasets.models.PresignedUrl] | None, source_urls: list[str] | None, datum_metadata: list[dict[str, Any]] | None, dataset: chariot.datasets.models.Dataset | None, video_options: chariot.datasets.models.VideoSamplingOptions | None)[source]

Bases: Base

created_at: datetime
data_created_at: datetime | None
dataset: Dataset | None
datum_metadata: list[dict[str, Any]] | None
delete_source: bool
id: str
image_validation: bool
is_gzipped: bool | None
job: Job | None
max_validation_errors: int
name: str | None
presigned_urls: list[PresignedUrl] | None
size: int | None
source_urls: list[str] | None
split: SplitName | None
status: UploadStatus
type: UploadType
updated_at: datetime
validation_errors: list[str] | None
video_options: VideoSamplingOptions | None
class chariot.datasets.models.UploadSortColumn(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

CREATION_TIMESTAMP = 'creation timestamp'
STATUS = 'status'
TYPE = 'type'
class chariot.datasets.models.UploadStatistics(upload_count: int)[source]

Bases: Base

upload_count: int
class chariot.datasets.models.UploadStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

CLEANUP = 'cleanup'
COMPLETE = 'complete'
CREATED = 'created'
ERROR = 'error'
PROCESSING = 'processing'
class chariot.datasets.models.UploadType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

ANNOTATION = 'annotation'
ARCHIVE = 'archive'
DATUM = 'datum'
INFERENCE = 'inference'
RAIC = 'raic'
TEXT = 'text'
VIDEO = 'video'
class chariot.datasets.models.VideoSamplingOptions(sampling_type: chariot.datasets.models.VideoSamplingType, sampling_value: int, deinterlace: bool)[source]

Bases: object

deinterlace: bool
sampling_type: VideoSamplingType
sampling_value: int
class chariot.datasets.models.VideoSamplingType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

NONE = 'none'
RATE = 'rate'
RATIO = 'ratio'
class chariot.datasets.models.View(task_type_label_filters: list[chariot.datasets.models.TaskTypeLabelFilter] | None, gps_coordinates_circle: chariot.datasets.models.Circle | None, gps_coordinates_rectangle: chariot.datasets.models.Rectangle | None, gps_coordinates_polygon: list[chariot.datasets.models.GeoPoint] | None, capture_timestamp_range: chariot.datasets.models.TimestampRange | None, metadata: dict[str, str] | None, asof_timestamp: datetime.datetime | None, unannotated: bool | None, datum_ids: list[str] | None, approval_status: list[chariot.datasets.models.ApprovalStatus] | None, annotation_metadata: dict[str, str] | None, sample_count: int | None, split_algorithm: chariot.datasets.models.SplitAlgorithm | None, apply_default_split: bool | None, splits: dict[chariot.datasets.models.SplitName, float] | None, id: str, name: str, snapshot_count: int | None, created_at: datetime.datetime, updated_at: datetime.datetime, archived_at: datetime.datetime | None = None, archived_by: str | None = None, dataset: chariot.datasets.models.Dataset | None = None)[source]

Bases: SplitConfig, DatumFilter

archived_at: datetime | None = None
archived_by: str | None = None
created_at: datetime
dataset: Dataset | None = None
id: str
name: str
snapshot_count: int | None
updated_at: datetime
class chariot.datasets.models.ViewSortColumn(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

CREATION_TIMESTAMP = 'creation timestamp'
ID = 'id'
NAME = 'name'
SAMPLE_COUNT = 'sample count'

chariot.datasets.snapshots module

chariot.datasets.snapshots.create_snapshot(*, view_id: str, name: str, timestamp: datetime, is_dry_run: bool = False) Snapshot[source]

Creates a new snapshot for a view at the specified event timestamp.

The newly created snapshot will be in status PENDING while datums are being assigned. You can call get_snapshot with the returned ID to check and see if the status is COMPLETE. To do this all in one call, use create_snapshot_and_wait

Parameters:
  • view_id (str) – Id of the view that the snapshot should belong to

  • name (str) – Snapshot name

  • timestamp (datetime) – Event timestamp that the snapshot should reflect

  • is_dry_run (bool) – If set to true, the function will return a snapshot for preview with datum count for each split of the most recent snapshot if exists and expected datum count for each split of new snapshot, as well as, available datum counts from unassigned datums with or without default splits. Default is false.

Returns:

The newly created snapshot in a PENDING status

Return type:

models.Snapshot

chariot.datasets.snapshots.create_snapshot_and_wait(*, view_id: str, name: str, timestamp: datetime, timeout: float = 5, wait_interval: float = 0.5) Snapshot[source]

Creates a new snapshot for a view at the specified event timestamp and polls the API until the snapshot is in a COMPLETE status or the timeout is reached.

Parameters:
  • view_id (str) – Id of the view that the snapshot should belong to

  • name (str) – Snapshot name

  • timestamp (datetime) – Event timestamp that the snapshot should reflect

  • timeout (float) – Number of seconds to wait for snapshot completion (default 5)

  • wait_interval (float) – Number of seconds between successive calls to check the snapshot for completion (default 1)

Returns:

The COMPLETE snapshot after datums have been assigned.

Return type:

models.Snapshot

Raises:

RuntimeError – If the timeout has been reached

chariot.datasets.snapshots.delete_snapshot(id: str) None[source]

Delete a snapshot by id. This can only be done if the snapshot’s status is still PENDING.

This will only start the deletion process on the backend. You can call get_snapshot with the snapshot’s ID and check for a NotFoundException to be raised to confirm deletion. To do this all in one call, use delete_snapshot_and_wait.

Parameters:

id (str) – Id of the snapshot to delete

chariot.datasets.snapshots.delete_snapshot_and_wait(id: str, *, timeout: float = 5, wait_interval: float = 0.5) None[source]

Delete a snapshot by id. This can only be done if the snapshot’s status is still PENDING. The returned task will poll the snapshot to confirm deletion. Once this is successful, the task will return.

Parameters:
  • id (str) – Id of the snapshot to delete

  • timeout (float) – Number of seconds to wait for snapshot deletion (default 5)

  • wait_interval (float) – Number of seconds between successive calls to check the snapshot for deletion (default 1)

Raises:

RuntimeError – If the timeout has been reached

chariot.datasets.snapshots.get_all_snapshots(*, exact_name_match: bool | None = None, name: str | None = None, timestamp_interval: TimestampRange | None = None, snapshot_ids: list[str] | None = None, sort: SnapshotSortColumn | None = None, direction: SortDirection | None = None, max_items: int | None = None) Generator[Snapshot, None, None][source]

Get all snapshots with optional filters. Returns a generator over all matching snapshots. Only admin user can access this function

Parameters:
  • exact_name_match (Optional[bool]) – Require name filter to match exactly (defaults to false)

  • name (Optional[str]) – Filter by snapshot name

  • timestamp_interval (Optional[models.TimestampRange]) – Filter by snapshots occurring during the intenval

  • snapshot_ids (Optional[List[str]]) – Filter by snapshot ids

  • sort (Optional[models.SnapshotSortColumn]) – How to sort the returned snapshots

  • direction (Optional[models.SortDirection]) – Whether to sort in ascending or descending order

  • max_items (Optional[int]) – The maximum number of snapshots to return

Returns:

Snapshot details for snapshots matching the criteria

Return type:

Generator[models.Snapshot, None, None]

chariot.datasets.snapshots.get_dataset_snapshots(dataset_id: str, *, exact_name_match: bool | None = None, name: str | None = None, timestamp_interval: TimestampRange | None = None, snapshot_ids: list[str] | None = None, sort: SnapshotSortColumn | None = None, direction: SortDirection | None = None, max_items: int | None = None) Generator[Snapshot, None, None][source]

Get a dataset’s snapshots with optional filters. Returns a generator over all matching snapshots.

Parameters:
  • dataset_id (str) – Id of the dataset that the snapshots belong to

  • exact_name_match (Optional[bool]) – Require name filter to match exactly (defaults to false)

  • name (Optional[str]) – Filter by snapshot name

  • timestamp_interval (Optional[models.TimestampRange]) – Filter by snapshots occurring during the intenval

  • snapshot_ids (Optional[List[str]]) – Filter by snapshot ids

  • sort (Optional[models.SnapshotSortColumn]) – How to sort the returned snapshots

  • direction (Optional[models.SortDirection]) – Whether to sort in ascending or descending order

  • max_items (Optional[int]) – The maximum number of snapshots to return

Returns:

Snapshot details for snapshots matching the criteria

Return type:

Generator[models.Snapshot, None, None]

chariot.datasets.snapshots.get_snapshot(id: str) Snapshot[source]

Get a snapshot by id

Parameters:

id (str) – Snapshot id

Returns:

Snapshot details

Return type:

models.Snapshot

chariot.datasets.snapshots.get_view_snapshot_count(view_id: str, *, exact_name_match: bool | None = None, name: str | None = None, timestamp_interval: TimestampRange | None = None, snapshot_ids: list[str] | None = None) int[source]

Get number of snapshots for the given view id with optional filters.

Parameters:
  • view_id (str) – Id of the view that the snapshots belong to

  • exact_name_match (Optional[bool]) – Require name filter to match exactly (defaults to false)

  • name (Optional[str]) – Filter by snapshot name

  • timestamp_interval (Optional[models.TimestampRange]) – Filter by snapshots occurring during the intenval

  • snapshot_ids (Optional[List[str]]) – Filter by snapshot ids

Returns:

Number of snapshots matching the criteria

Return type:

int

chariot.datasets.snapshots.get_view_snapshots(view_id: str, *, exact_name_match: bool | None = None, name: str | None = None, timestamp_interval: TimestampRange | None = None, snapshot_ids: list[str] | None = None, sort: SnapshotSortColumn | None = None, direction: SortDirection | None = None, max_items: int | None = None) Generator[Snapshot, None, None][source]

Get a view’s snapshots with optional filters. Returns a generator over all matching snapshots.

Parameters:
  • view_id (str) – Id of the view that the snapshots belong to

  • exact_name_match (Optional[bool]) – Require name filter to match exactly (defaults to false)

  • name (Optional[str]) – Filter by snapshot name

  • timestamp_interval (Optional[models.TimestampRange]) – Filter by snapshots occurring during the intenval

  • snapshot_ids (Optional[List[str]]) – Filter by snapshot ids

  • sort (Optional[models.SnapshotSortColumn]) – How to sort the returned snapshots

  • direction (Optional[models.SortDirection]) – Whether to sort in ascending or descending order

  • max_items (Optional[int]) – The maximum number of snapshots to return

Returns:

Snapshot details for snapshots matching the criteria

Return type:

Generator[models.Snapshot, None, None]

chariot.datasets.tasks module

chariot.datasets.tasks.archive_task(id: str) DatumTask[source]

Archive a datum annotation task.

Parameters:

id (str) – Datum annotation task id

Returns:

The datum annotation task that has been archived

Return type:

models.DatumTask

chariot.datasets.tasks.count_task_activity(task_id: str, *, activities: list[DatumTaskActivityCode] | None = None, dataset_ids: list[str] | None = None, user_ids: list[str] | None = None) int[source]

Count the activities for the provided task and filters.

Parameters:
  • task_id (str) – Id of the task

  • activities (Optional[List[models.DatumTaskActivityCode]]) – List of activity types to filter by

  • dataset_ids (Optional[List[str]]) – List of dataset ids to filter by

  • user_ids (Optional[List[str]]) – List of user ids to filter by

Returns:

Number of matching task activities

Return type:

int

chariot.datasets.tasks.count_tasks(*, search: str | None = None, exact_name_match: bool | None = None, include_archived: bool | None = None, project_ids: list[str] | None = None, task_ids: list[str] | None = None) int[source]

Get number of tasks that match given criteria.

Parameters:
  • search – Search string (full text search against name and description fields)

  • exact_name_match (Optional[bool]) – Require search to exactly match the task name (defaults to false)

  • include_archived (Optional[bool]) – If true, archived tasks will be included in the results (defaults to false)

  • project_ids (Optional[List[str]]) – Filter by project ids

  • task_ids (Optional[List[str]]) – Filter by task ids

Returns:

Number of tasks that match given criteria

Return type:

int

chariot.datasets.tasks.count_tasks_activity(exact_name_match: bool | None = None, search: str | None = None, project_ids: list[str] | None = None, task_ids: list[str] | None = None, activities: list[DatumTaskActivityCode] | None = None, dataset_ids: list[str] | None = None, user_ids: list[str] | None = None) int[source]

Count matching activities .

Parameters:
  • exact_name_match (Optional[bool]) – Require search filter to match exactly (defaults to false)

  • search (Optional[str]) – Search string (full text search against task name and description fields)

  • project_ids (Optional[List[str]]) – List of project ids to filter by

  • task_ids (Optional[List[str]]) – List of task ids to filter by

  • activities (Optional[List[models.DatumTaskActivityCode]]) – List of activity types to filter by

  • dataset_ids (Optional[List[str]]) – List of dataset ids to filter by

  • user_ids (Optional[List[str]]) – List of user ids to filter by

Returns:

Number of matching task activities

Return type:

int

chariot.datasets.tasks.create_task(*, name: str, project_id: str, description: str | None = None, dataset_config: DatasetConfig | None = None, datum_config: DatumConfig | None = None) DatumTask[source]

Create a new datum annotation task.

Parameters:
  • name (str) – Datum annotation task name

  • project_id (str) – Project id that datum annotation Task belongs to

  • description (Optional[str]) – Datum annotation task description

Returns:

New datum annotation task detail

Return type:

models.DatumTask

chariot.datasets.tasks.delete_datum_lock_for_task(id: str, task_id: str, user_id: str | None = None) None[source]

Delete the specified datum’s lock for a given task.

Must be the current user holding the lock.

Parameters:
  • id (str) – The id of the datum

  • task_id (str) – The id of the task

  • user_id (Optional[str]) – The id of the user who holds the lock

Returns:

None

chariot.datasets.tasks.get_datum_for_task(task_id: str, *, unannotated: bool = False, random: bool = False, id_after: str | None = None, prev_datum_id: str | None = None, skip_prev: bool | None = None) Datum | None[source]

Get the next available datum for the given task. Returns None if there are no datums available.

Parameters:
  • task_id (str) – The id of the task

  • unannotated (Optional[bool]) – If true, only unannotated datums will be returned (defaults to false)

  • random (Optional[bool]) – If true, returns a random available datum instead of the next available datum (defaults to false)

  • id_after (Optional[str]) – If provided, will return a datum that is after the given datum id (can be used to resume a task from a specific point, or to skip a specific datum)

  • prev_datum_id (Optional[str]) – if specified, any lock held by the user on this datum will be released if a new datum is acquired

  • skip_prev (Optional[bool]) – if true, the datum specified by prev_datum_id will be marked as ‘skipped’ rather than ‘viewed’ when its lock is released, putting it at the end of the task queue

Returns:

The datum, or None if no datums matching the request are available

Return type:

Optional[models.Datum]

chariot.datasets.tasks.get_datum_for_task_by_id(id: str, task_id: str) Datum | None[source]

Get the specific datum designated by id.

Parameters:
  • id (str) – The id of the datum

  • task_id (str) – The id of the task

Returns:

The datum, None if no datum is found or

the datum does not apply to the specified task. :rtype: Optional[models.Datum]

chariot.datasets.tasks.get_task(id: str) DatumTaskDetails[source]

Get a datum annotation task by id.

Parameters:

id (str) – Datum annotation task id

Returns:

The datum annotation task details

Return type:

models.DatumTaskDetails

chariot.datasets.tasks.get_task_activity(task_id: str, *, activities: list[DatumTaskActivityCode] | None = None, dataset_ids: list[str] | None = None, user_ids: list[str] | None = None, direction: SortDirection | None = None, sort: TaskActivitySortColumn | None = None, max_items: int | None = None) Generator[DatumTaskActivity, None, None][source]

Get the activities for the provided task and filters.

Parameters:
  • task_id (str) – Id of the task

  • activities (Optional[List[models.DatumTaskActivityCode]]) – List of activity types to filter by

  • dataset_ids (Optional[List[str]]) – List of dataset ids to filter by

  • user_ids (Optional[List[str]]) – List of user ids to filter by

  • direction (Optional[models.SortDirection]) – Sort direction

  • sort (Optional[models.TaskActivitySortColumn]) – Sort column

  • max_items (Optional[int]) – Limit the returned generator to only produce this many items

Returns:

Generator over the matching task activities

Return type:

Generator[models.DatumTaskActivity, None, None]

chariot.datasets.tasks.get_task_datum_count(task_id: str) int[source]

Get the number of datums in the provided task.

Parameters:

task_id (str) – The id of the task

Returns:

The datum count

Return type:

int

chariot.datasets.tasks.get_task_statistics(id: str, *, task_type_label_filters: list[TaskTypeLabelFilter] | None = None, gps_coordinates_circle: Circle | None = None, gps_coordinates_rectangle: Rectangle | None = None, gps_coordinates_polygon: list[GeoPoint] | None = None, capture_timestamp_range: TimestampRange | None = None, metadata: dict[str, str] | None = None, asof_timestamp: datetime | None = None, unannotated: bool | None = None, datum_ids: list[str] | None = None, approval_status: list[str] | None = None, annotation_metadata: dict[str, str] | None = None) DatumTaskStatistics[source]

Get dataset datum statistics with various criteria

Parameters:
  • id (str) – Id of datum task to get statistics for

  • task_type_label_filters (Optional[List[models.TaskTypeLabelFilter]]) – Filter by task types and associated labels

  • gps_coordinates_circle (Optional[models.Circle]) – Filter datums within the given circle

  • gps_coordinates_rectangle (Optional[models.Rectangle]) – Filter datums within the given rectangle

  • gps_coordinates_polygon (Optional[List[models.GeoPoint]]) – Filter datums within the given polygon

  • capture_timestamp_range (Optional[models.TimestampRange]) – Filter by datum capture timestamp

  • metadata (Optional[Dict[str, str]]) – Filter by datum metadata values

  • datum_ids (Optional[List[str]]) – Filter datums with a list of datum ids

  • approval_status (Optional[List[str]]) – Filter by annotation approval status

  • annotation_metadata (Optional[Dict[str, str]]) – Filter by annotation metadata values

Returns:

Datum task statistics

Return type:

models.DatumTaskStatistics

chariot.datasets.tasks.get_tasks(*, search: str | None = None, exact_name_match: bool | None = None, include_archived: bool | None = None, project_ids: list[str] | None = None, task_ids: list[str] | None = None, sort: TaskSortColumn | None = None, direction: SortDirection | None = None, max_items: int | None = None) Generator[DatumTask, None, None][source]

Get datum annotation tasks that match various criteria. Returns a generator over all matching tasks.

Parameters:
  • search (Optional[str]) – Search string (full text search against name and description fields)

  • exact_name_match (Optional[bool]) – Require search to exactly match the task name (defaults to false)

  • include_archived (Optional[bool]) – If true, archived tasks will be included in the results (defaults to false)

  • project_ids (Optional[List[str]]) – Filter by project ids

  • task_ids (Optional[List[str]]) – Filter by task ids

  • sort (Optional[models.TaskSortColumn]) – What column to sort the tasks by (defaults to name)

  • direction (Optional[models.SortDirection]) – Whether to sort in ascending or descending order

  • max_items (Optional[int]) – Limit the returned generator to only produce this many items

Returns:

Task definitions for tasks matching the criteria

Return type:

Generator[models.DatumTask, None, None]

chariot.datasets.tasks.get_tasks_activity(exact_name_match: bool | None = None, search: str | None = None, project_ids: list[str] | None = None, task_ids: list[str] | None = None, activities: list[DatumTaskActivityCode] | None = None, dataset_ids: list[str] | None = None, user_ids: list[str] | None = None, direction: SortDirection | None = None, sort: TaskActivitySortColumn | None = None, max_items: int | None = None) Generator[DatumTaskActivity, None, None][source]

Get the matching activities.

Parameters:
  • exact_name_match (Optional[bool]) – Require search filter to match exactly (defaults to false)

  • search (Optional[str]) – Search string (full text search against task name and description fields)

  • project_ids (Optional[List[str]]) – List of project ids to filter by

  • task_ids (Optional[List[str]]) – List of task ids to filter by

  • activities (Optional[List[models.DatumTaskActivityCode]]) – List of activity types to filter by

  • dataset_ids (Optional[List[str]]) – List of dataset ids to filter by

  • user_ids (Optional[List[str]]) – List of user ids to filter by

  • direction (Optional[models.SortDirection]) – Sort direction

  • sort (Optional[models.TaskActivitySortColumn]) – Sort column

  • max_items (Optional[int]) – Limit the returned generator to only produce this many items

Returns:

Generator over the matching task activities

Return type:

Generator[models.DatumTaskActivity, None, None]

chariot.datasets.uploads module

chariot.datasets.uploads.delete_upload(id: str) Upload[source]

Delete an upload by id. This can only be done if the upload’s status is not COMPLETE or CLEANUP.

Parameters:

id (str) – Id of the upload to delete

Returns:

The upload details

Return type:

models.Upload

chariot.datasets.uploads.delete_upload_and_wait(id: str, *, timeout: float = 5, wait_interval: float = 0.5) None[source]

Delete an upload by id. This can only be done if the upload’s status is not COMPLETE or CLEANUP. Polls for the upload, blocking until the upload has been deleted or the timeout has been reached.

Parameters:
  • id (str) – Id of the upload to delete

  • timeout (float) – Number of seconds to wait for snapshot deletion (default 5)

  • wait_interval (float) – Number of seconds between successive calls to check the upload for deletion (default 0.5)

Raises:

TimeoutError – If the timeout has been reached

chariot.datasets.uploads.get_upload(id: str) Upload[source]
chariot.datasets.uploads.get_upload_statistics(*, dataset_id: str, type: list[UploadType] | None = None, status: list[UploadStatus] | None = None) UploadStatistics[source]

Get upload statistics with various criteria.

Parameters:
  • dataset_id (str) – Id of the dataset to get uploads for

  • type (Optional[models.UploadType]) – Filter snapshots by upload type

  • status – Filter snapshots by upload status

Returns:

Statistics of uploads matching the criteria

Return type:

models.UploadStatistics

chariot.datasets.uploads.get_uploads(dataset_id: str, *, type: list[UploadType] | None = None, status: list[UploadStatus] | None = None, sort: UploadSortColumn | None = None, direction: SortDirection | None = None, max_items: int | None = None) Generator[Upload, None, None][source]

Get uploads for a dataset

Parameters:
  • dataset_id (str) – Id of the dataset to get uploads for

  • type (Optional[models.UploadType]) – Filter snapshots by upload type

  • status – Filter snapshots by upload status

  • sort (Optional[models.UploadSortColumn]) – How to sort the uploads

  • direction (Optional[models.SortDirection]) – Whether to sort in ascending or descending order

  • max_items (Optional[int]) – The maximum number of uploads to return

Returns:

Upload details for uploads lmatching the criteria

Return type:

Generator[models.Upload, None, None]

chariot.datasets.uploads.retry_upload(id: str) Upload[source]

Retry processing of an upload that previously did not succeed.

Parameters:

id (str) – Id of the upload to delete

Returns:

The upload details

Return type:

models.Upload

chariot.datasets.uploads.retry_upload_and_wait(id: str, *, timeout: float = 5, wait_interval: float = 0.5) Upload[source]

Retry processing of an upload that previously did not succeed. Polls for the upload, blocking until the upload has finished processing or the timeout has been reached.

Parameters:
  • id (str) – Id of the upload to delete

  • timeout (float) – Number of seconds to wait for snapshot deletion (default 5)

  • wait_interval (float) – Number of seconds between successive calls to check the upload for completion (default 0.5)

Returns:

The upload details

Return type:

models.Upload

Raises:

TimeoutError – If the timeout has been reached

chariot.datasets.uploads.upload_bytes(dataset_id: str, *, type: UploadType, data: bytes, max_validation_errors: int | None = None, image_validation: bool | None = None, split: SplitName | None = None, datum_metadata: dict[str, Any] | None = None, video_sampling_type: VideoSamplingType | None = None, video_sampling_value: float | None = None, video_deinterlace: bool | None = None) Upload[source]

Uploads a set of bytes as a single file. Does not wait for the upload to complete processing.

Parameters:
  • dataset_id (str) – Id of the dataset to upload to

  • type (models.UploadType) – The type of file being uploaded.

  • data (bytes) – Bytes to upload

  • max_validation_errors (Optional[int]) – Maximum number of validation errors to tolerate before failing the upload

  • image_validation (Optional[bool]) – Whether or not to perform extra validations on image datums

  • split (Optional[models.SplitName]) – Name of split to upload datums to.

  • datum_metadata (Optional[Dict[str, Any]]) – When uploading a single datum (type=models.UploadType.DATUM), include custom metadata on this datum

  • video_sampling_type (Optional[models.VideoSamplingType]) – When uploading a video, optionally control how frames are sampled (at a constant rate, by a ratio of the videos frame rate, or none [all frames are extracted])

  • video_sampling_value (Optional[float]) – When uploading a video with a video_sampling_type of VideoSamplingType.RATE or VideoSamplingType.RATIO, this value controls the rate or ratio of sampling (either an FPS value or a multiplier for the video’s FPS, respectively)

  • video_deinterlace (Optional[bool]) – When uploading a video, optionally have a deinterlacing filter applied prior to extracting frames

Returns:

The upload details

Return type:

models.Upload

chariot.datasets.uploads.upload_bytes_and_wait(dataset_id: str, *, type: UploadType, data: bytes, max_validation_errors: int | None = None, image_validation: bool | None = None, split: SplitName | None = None, datum_metadata: dict[str, Any] | None = None, video_sampling_type: VideoSamplingType | None = None, video_sampling_value: float | None = None, video_deinterlace: bool | None = None, timeout: float = 3600, wait_interval: float = 0.5) Upload[source]

Uploads a set of bytes as a single file, and waits for the upload to complete processing.

Parameters:
  • dataset_id (str) – Id of the dataset to upload to

  • type (models.UploadType) – The type of file being uploaded.

  • data (bytes) – Bytes to upload

  • max_validation_errors (Optional[int]) – Maximum number of validation errors to tolerate before failing the upload

  • image_validation (Optional[bool]) – Whether or not to perform extra validations on image datums

  • split (Optional[models.SplitName]) – Name of split to upload datums to.

  • datum_metadata (Optional[Dict[str, Any]]) – When uploading a single datum (type=models.UploadType.DATUM), include custom metadata on this datum

  • video_sampling_type (Optional[models.VideoSamplingType]) – When uploading a video, optionally control how frames are sampled (at a constant rate, by a ratio of the videos frame rate, or none [all frames are extracted])

  • video_sampling_value (Optional[float]) – When uploading a video with a video_sampling_type of VideoSamplingType.RATE or VideoSamplingType.RATIO, this value controls the rate or ratio of sampling (either an FPS value or a multiplier for the video’s FPS, respectively)

  • video_deinterlace (Optional[bool]) – When uploading a video, optionally have a deinterlacing filter applied prior to extracting frames

  • timeout (float) – Number of seconds to wait for upload to complete (default 3600)

  • wait_interval (float) – Number of seconds between successive calls to check the upload for completion (default 0.5)

Returns:

The upload details

Return type:

models.Upload

Raises:
chariot.datasets.uploads.upload_file(dataset_id: str, *, type: UploadType, path: str, max_validation_errors: int | None = None, image_validation: bool | None = None, split: SplitName | None = None, datum_metadata: dict[str, Any] | None = None, video_sampling_type: VideoSamplingType | None = None, video_sampling_value: float | None = None, video_deinterlace: bool | None = None) Upload[source]

Uploads a single file. Does not wait for the upload to complete processing.

Parameters:
  • dataset_id (str) – Id of the dataset to upload to

  • type (models.UploadType) – The type of file being uploaded.

  • path (str) – Path of file to upload

  • max_validation_errors (Optional[int]) – Maximum number of validation errors to tolerate before failing the upload

  • image_validation (Optional[bool]) – Whether or not to perform extra validations on image datums

  • split (Optional[models.SplitName]) – Name of split to upload datums to.

  • datum_metadata (Optional[Dict[str, Any]]) – When uploading a single datum (type=models.UploadType.DATUM), include custom metadata on this datum

  • video_sampling_type (Optional[models.VideoSamplingType]) – When uploading a video, optionally control how frames are sampled (at a constant rate, by a ratio of the videos frame rate, or none [all frames are extracted])

  • video_sampling_value (Optional[float]) – When uploading a video with a video_sampling_type of VideoSamplingType.RATE or VideoSamplingType.RATIO, this value controls the rate or ratio of sampling (either an FPS value or a multiplier for the video’s FPS, respectively)

  • video_deinterlace (Optional[bool]) – When uploading a video, optionally have a deinterlacing filter applied prior to extracting frames

Returns:

The upload details

Return type:

models.Upload

chariot.datasets.uploads.upload_file_and_wait(dataset_id: str, *, type: UploadType, path: str, max_validation_errors: int | None = None, image_validation: bool | None = None, split: SplitName | None = None, datum_metadata: dict[str, Any] | None = None, video_sampling_type: VideoSamplingType | None = None, video_sampling_value: float | None = None, video_deinterlace: bool | None = None, timeout: float = 3600, wait_interval: float = 0.5) Upload[source]

Uploads a single file, and waits for the upload to complete processing.

Parameters:
  • dataset_id (str) – Id of the dataset to upload to

  • type (models.UploadType) – The type of file being uploaded.

  • path (str) – Path of file to upload

  • max_validation_errors (Optional[int]) – Maximum number of validation errors to tolerate before failing the upload

  • image_validation (Optional[bool]) – Whether or not to perform extra validations on image datums

  • split (Optional[models.SplitName]) – Name of split to upload datums to.

  • datum_metadata (Optional[Dict[str, Any]]) – When uploading a single datum (type=models.UploadType.DATUM), include custom metadata on this datum

  • video_sampling_type (Optional[models.VideoSamplingType]) – When uploading a video, optionally control how frames are sampled (at a constant rate, by a ratio of the videos frame rate, or none [all frames are extracted])

  • video_sampling_value (Optional[float]) – When uploading a video with a video_sampling_type of VideoSamplingType.RATE or VideoSamplingType.RATIO, this value controls the rate or ratio of sampling (either an FPS value or a multiplier for the video’s FPS, respectively)

  • video_deinterlace (Optional[bool]) – When uploading a video, optionally have a deinterlacing filter applied prior to extracting frames

  • timeout (float) – Number of seconds to wait for upload to complete (default 3600)

  • wait_interval (float) – Number of seconds between successive calls to check the upload for completion (default 0.5)

Returns:

The upload details

Return type:

models.Upload

Raises:
chariot.datasets.uploads.upload_files_from_urls(dataset_id: str, *, type: UploadType, source_urls: list[str], source_urls_datum_metadata: list[dict[str, Any]] | None = None, annotations_url: str | None = None, max_validation_errors: int | None = None, image_validation: bool | None = None, split: SplitName | None = None) Upload[source]

Uploads a list of urls to a dataset as individual datums. Does not wait for the upload to complete processing.

Parameters:
  • type (models.UploadType) – The type of file being uploaded. Must be one of models.UploadType.{ARCHIVE|DATUM}

  • source_urls (List[str]) – List of URLs from which the datums are read. len() must be equal to 1 for ARCHIVE upload type.

  • source_urls_datum_metadata (Optional[List[Dict[str, Any]]]) – When uploading individual datums (type=models.UploadType.DATUM), include custom metadata for datums created by each URL. List index should match the desired source_urls list index, empty array elements should include empty Dicts.

  • annotations_url (Optional[str]) – URL from which a gzipped annotations file in jsonl format will be downloaded and processed along datums from source_urls. Attribute path in the annotations file will be datum index in source_urls.

  • max_validation_errors (Optional[int]) – Maximum number of validation errors to tolerate before failing the upload

  • image_validation (Optional[bool]) – Whether or not to perform extra validations on image datums

  • split (Optional[models.SplitName]) – Name of split to upload datums to.

Returns:

The upload details

Return type:

models.Upload

chariot.datasets.uploads.upload_files_from_urls_and_wait(dataset_id: str, *, type: UploadType, source_urls: list[str], source_urls_datum_metadata: list[dict[str, Any]] | None = None, annotations_url: str | None = None, max_validation_errors: int | None = None, image_validation: bool | None = None, split: SplitName | None = None, timeout: float = 3600, wait_interval: float = 0.5) Upload[source]

Uploads a set of bytes as a single file, and waits for the upload to complete processing.

Parameters:
  • dataset_id (str) – Id of the dataset to upload to

  • type (models.UploadType) – The type of file being uploaded. Must be one of models.UploadType.{ARCHIVE|DATUM}

  • source_urls (List[str]) – List of URLs from which the datums are read. len() must be equal to 1 for ARCHIVE upload type.

  • source_urls_datum_metadata (Optional[List[Dict[str, Any]]]) – When uploading individual datums (type=models.UploadType.DATUM), include custom metadata for datums created by each URL. List index should match the desired source_urls list index and empty array elements should include empty Dicts.

  • annotations_url (Optional[str]) – URL from which a gzipped annotations file in jsonl format will be downloaded and processed along datums from source_urls. Attribute path in the annotations file will be datum index in source_urls.

  • max_validation_errors (Optional[int]) – Maximum number of validation errors to tolerate before failing the upload

  • image_validation (Optional[bool]) – Whether or not to perform extra validations on image datums

  • split (Optional[models.SplitName]) – Name of split to upload datums to.

  • timeout (float) – Number of seconds to wait for upload to complete (default 3600)

  • wait_interval (float) – Number of seconds between successive calls to check the upload for completion (default 0.5)

Returns:

The upload details

Return type:

models.Upload

Raises:
chariot.datasets.uploads.upload_folder(dataset_id: str, *, path: str, max_validation_errors: int | None = None, image_validation: bool | None = None, split: SplitName | None = None) Upload[source]

Uploads the contents of a folder. Equivalent to creating an archive from that folder and then uploading that archive with type=UploadType.ARCHIVE. Does not wait for the upload to complete processing.

Parameters:
  • dataset_id (str) – Id of the dataset to upload to

  • path (str) – Path of folder to upload

  • max_validation_errors (Optional[int]) – Maximum number of validation errors to tolerate before failing the upload

  • image_validation (Optional[bool]) – Whether or not to perform extra validations on image datums

  • split (Optional[models.SplitName]) – Name of split to upload datums to.

Returns:

The upload details

Return type:

models.Upload

chariot.datasets.uploads.upload_folder_and_wait(dataset_id: str, *, path: str, max_validation_errors: int | None = None, image_validation: bool | None = None, split: SplitName | None = None, timeout: float = 3600, wait_interval: float = 0.5) Upload[source]

Uploads the contents of a folder. Equivalent to creating an archive from that folder and then uploading that archive with type=UploadType.ARCHIVE. Waits for the upload to complete processing.

Parameters:
  • dataset_id (str) – Id of the dataset to upload to

  • path (str) – Path of folder to upload

  • max_validation_errors (Optional[int]) – Maximum number of validation errors to tolerate before failing the upload

  • image_validation (Optional[bool]) – Whether or not to perform extra validations on image datums

  • split (Optional[models.SplitName]) – Name of split to upload datums to.

  • timeout (float) – Number of seconds to wait for upload to complete (default 3600)

  • wait_interval (float) – Number of seconds between successive calls to check the upload for completion (default 0.5)

Returns:

The upload details

Return type:

models.Upload

Raises:
chariot.datasets.uploads.wait_for_upload(id: str, *, timeout: float = 3600, wait_interval: float = 0.5) Upload[source]

Polls the given upload until it has finished processing.

Parameters:
  • id (str) – Id of the upload to wait for

  • timeout (float) – Number of seconds to wait for upload to complete (default 3600)

  • wait_interval (float) – Number of seconds between successive calls to check the upload for completion (default 0.5)

Returns:

The upload details

Return type:

models.Upload

Raises:

chariot.datasets.views module

chariot.datasets.views.create_view(*, dataset_id: str, name: str, split_algorithm: SplitAlgorithm | None = None, apply_default_split: bool | None = None, splits: dict[SplitName, float] | None = None, metadata: dict[str, str] | None = None, capture_timestamp_range: TimestampRange | None = None, gps_coordinates_circle: Circle | None = None, gps_coordinates_polygon: list[GeoPoint] | None = None, gps_coordinates_rectangle: Rectangle | None = None, task_type_label_filters: list[TaskTypeLabelFilter] | None = None, approval_status: list[str] | None = None, annotation_metadata: dict[str, str] | None = None, sample_count: int | None = None) View[source]

Create a new view in the dataset ID given.

Parameters:
  • dataset_id (str) – Id of dataset to create new view in

  • name (str) – View name

  • split_algorithm (Optional[models.SplitAlgorithm],) – Splitting algorithm for the view (defaults to Random)

  • apply_default_split (Optional[bool]) – Whether default splits are used when splitting (defaults to true)

  • splits (Optional[Dict[models.SplitName, float]]) – Split weights for splitting datums in the view

  • metadata (Optional[Dict[str, str]]) – Add metadata filter to view

  • capture_timestamp_range (Optional[models.TimestampRange]) – Add capture timestamp range filter to view

  • gps_coordinates_circle (Optional[models.Circle]) – Add circle filter to view

  • gps_coordinates_polygon (Optional[List[models.GeoPoint]]) – Add polygon filter to view

  • gps_coordinates_rectangle (Optional[models.Rectangle]) – Add rectangle filter to view

  • task_type_label_filters (Optional[List[models.TaskTypeLabelFilter]]) – Add filter for task types and associated labels to view

  • approval_status (Optional[List[str]]) – Filter by annotation approval status

  • annotation_metadata (Optional[Dict[str, str]]) – Filter by annotation metadata values

  • sample_count (Optional[int]) – Sample count for the view

Returns:

View details for the newly created view

Return type:

models.View

chariot.datasets.views.delete_view(id: str) View[source]

Delete a view by id. The artifacts for the view will be deleted as well.

Parameters:

id (str) – Id of view to delete

Returns:

View that was deleted

Return type:

models.View

chariot.datasets.views.get_all_view_count(*, name: str | None = None, exact_name_match: bool | None = None, view_ids: list[str] | None = None) int[source]

Get number of views in the dataset id given. Only admin user can access this function

Parameters:
  • name (Optional[str]) – Filter views counted by name

  • exact_name_match (Optional[bool]) – Require name filter to match exactly (defaults to false)

  • view_ids (Optional[List[str]]) – Filter by view ids

Returns:

Number of views in all datasets

Return type:

int

chariot.datasets.views.get_all_views(*, name: str | None = None, exact_name_match: bool | None = None, view_ids: list[str] | None = None, sort: ViewSortColumn | None = None, direction: SortDirection | None = None, max_items: int | None = None) Generator[View, None, None][source]

Get views for all datasets with various criteria. Returns a generator over all matching views Only admin user can access this function

Parameters:
  • name (Optional[str]) – Filter by view name

  • exact_name_match (Optional[bool]) – Require name filter to match exactly (defaults to false)

  • view_ids (Optional[List[str]]) – Filter by view ids

  • sort (Optional[models.ViewSortColumn]) – What column to sort the views by (defaults to name)

  • direction (Optional[models.SortDirection]) – Whether to sort in ascending or descending order

  • max_items (Optional[int]) – Limit the returned generator to only produce this many items

Returns:

View details for views matching the criteria

Return type:

Generator[models.View, None, None]

chariot.datasets.views.get_dataset_view_count(dataset_id: str, *, name: str | None = None, exact_name_match: bool | None = None, view_ids: list[str] | None = None) int[source]

Get number of views in the dataset id given.

Parameters:
  • dataset_id (str) – Id of dataset to get number of views in

  • name (Optional[str]) – Filter views counted by name

  • exact_name_match (Optional[bool]) – Require name filter to match exactly (defaults to false)

  • view_ids (Optional[List[str]]) – Filter by view ids

Returns:

Number of views in the provided dataset id

Return type:

int

chariot.datasets.views.get_dataset_views(dataset_id: str, *, name: str | None = None, exact_name_match: bool | None = None, view_ids: list[str] | None = None, sort: ViewSortColumn | None = None, direction: SortDirection | None = None, max_items: int | None = None) Generator[View, None, None][source]

Get views for dataset id with various criteria. Returns a generator over all matching views

Parameters:
  • dataset_id (str) – Dataset ID to search for views in.

  • name (Optional[str]) – Filter by view name

  • exact_name_match (Optional[bool]) – Require name filter to match exactly (defaults to false)

  • view_ids (Optional[List[str]]) – Filter by view ids

  • sort (Optional[models.ViewSortColumn]) – What column to sort the views by (defaults to name)

  • direction (Optional[models.SortDirection]) – Whether to sort in ascending or descending order

  • max_items (Optional[int]) – Limit the returned generator to only produce this many items

Returns:

View details for views matching the criteria

Return type:

Generator[models.View, None, None]

chariot.datasets.views.get_view(id: str) View[source]
chariot.datasets.views.get_view_timeline(id: str, *, max_items: int | None = None, direction: SortDirection | None = None, since_last_snapshot: bool | None = None, min_groups: int | None = None, max_ungrouped_events: int | None = None) Iterator[DatasetTimelineEvent][source]

Get a series of dataset change events affecting the given view ordered by time and grouped by event type.

Parameters:
  • id (str) – Id of view to get events for

  • max_items (Optional[int]) – Limit the returned generator to only produce this many items

  • direction (Optional[models.SortDirection]) – Whether to sort in ascending or descending order

  • since_last_snapshot (Optional[bool]) – Whether or not to only return events since the last snapshot for this view (defaults to false)

  • min_groups (Optional[int]) – How many groups are required before grouping behavior is turned on

  • max_ungrouped_events (Optional[int]) – The maximum number of events allowed before grouping behavior is turned on

Returns:

Events for the view

Return type:

Iterator[models.DatasetTimelineEvent]

chariot.datasets.views.update_view(*, id: str, name: str) View[source]

Update the name of a view by id.

Parameters:
  • id (str) – Id of view to update the name of

  • name (str) – New name for the view

Returns:

View with same id but new name

Return type:

models.View

Module contents