Skip to main content

Annotation Format

Annotations of a dataset can be specified by including an annotations.json or annotations.jsonl file in the root of your archive. An annotations.json file contains an array of annotation JSON objects, whereas a annotations.jsonl file should contain an annotation JSON object on each line. An annotation JSON object has the following fields:

note

path must precede annotations or the path will be ignored and the system will inform you that some of your annotations are missing a valid path.

note

Annotations may be added by uploading an individual annotation.jsonl file using the SDK. See SDK documentation for more information. For annotation files uploaded outside of an archive, replace the path attribute with a datum_id attribute in this section's documentation. The datum id can be retrieved via the datum's URL in the UI or when interacting with datums via the SDK.

  • path: Absolute path to the image or text file within the compressed file structure; see the example below.

  • annotations: A list of objects; one for each annotated object in the image file or text file.

    • For images, there are four annotation entry options:
      • Image Classification: class_label
      • Image Segmentation: class_label and contour
      • Object Detection: class_label and bbox
      • Oriented Object Detection: class_label and oriented_bbox
    • For text, there are three annotation entry options:
      • Text Classification: context and label
      • Text Token Classification: start, end, and label
      • Text Generation: context and generated_text
note

All class_label values in your annotation files must be strings (e.g., "cat", "dog", "1", "2"). Do not use integers or other types for class labels. Using integers will cause your dataset upload to fail.

Annotation Examples

Task Type Field

All annotations specify a task_type field to categorize their purpose. This is not required for uploads; the task_type is inferred based on the fields included with that upload. However, it will be in the file when a dataset archive is downloaded from Chariot.

While not required, it is recommended that users include a task type to keep things as similar as possible to what they intend to see in a downloaded archive of the dataset.

The below task types are valid:

  • Image Classification
  • Object Detection
  • Oriented Object Detection
  • Image Segmentation
  • Token Classification
  • Text Classification
  • Text Generation

Image Classification

A dataset supporting image classification may have any number of Image Classification annotations. For example, the following .jsonl file defines an annotation for a dataset consisting of two images, the first of which is a dog and the second of which is a cat.

{"path": "a/b/c/img1.png", "annotations": [{"class_label": "dog"}]}
{"path": "a/b/c/img2.png", "annotations": [{"class_label": "cat"}]}

Object Detection

For a dataset to support object detection, each annotation should have a bbox field, with the keys xmin, ymin, xmax, and ymax specifying the bounding box. In the example below, a dataset contains three defined images: The first contains a dog and person, the second contains a single object (a cat), and the third image contains no object of interest.

{"path": "a/b/d/img1.png", "annotations": [{"class_label": "dog", "bbox": {"xmin": 16, "ymin": 130, "xmax": 70, "ymax": 150}}, {"class_label": "person", "bbox": {"xmin": 89, "ymin": 10, "xmax": 97, "ymax": 110}}]}
{"path": "a/b/d/img2.png", "annotations": [{"class_label": "cat", "bbox": {"xmin": 500, "ymin": 220, "xmax": 530, "ymax": 260}}]}
{"path": "a/b/d/img3.png", "annotations": []}

Oriented Object Detection

For oriented object detection, each annotation should have an oriented_bbox field, with the keys cx, cy, w, h, and r specifying the oriented bounding box.

The keys are defined as:

  • cx - The center X coordinate of the bounding box, as a fraction of the image's width
  • cy - The center Y coordinate of the bounding box, as a fraction of the image's height
  • w - The width of the bounding box, as a fraction of the image's width
  • h - The height of the bounding box, as a fraction of the image's height
  • r - The rotation of the bounding box, in radians. Defined as the clockwise angle between the w edge of the bounding box and the image's width axis

In the example below, a dataset contains two defined images: The first contains a dog and person, the second contains a single object (a cat).

{"path": "a/b/d/img1.png", "annotations": [{"class_label": "dog", "oriented_bbox": {"cx": 0.52, "cy": 0.82, "w": 0.07, "h": 0.02, "r": 0.17}}, {"class_label": "person", "oriented_bbox": {"cx": 0.13, "cy": 0.43, "w": 0.18, "h": 0.08, "r": 0.06}}]}
{"path": "a/b/d/img2.png", "annotations": [{"class_label": "cat", "oriented_bbox": {"cx": 0.85, "cy": 0.15, "w": 0.38, "h": 0.28, "r": 0.97}}]}

Image Segmentation

For segmentation tasks, polygon contours must be specified to better determine data visualization. Polygonal contours are a list of elevation values that can describe an occluded View of a single object to better visualize image information.

For example, in an image of a car parked behind a telephone pole, the annotator can specify two regions that describe the car: an outer list that encapsulates the full contour and two inner lists that describe the points within each region.

{"path": "a/b/c/img1.png", "annotations": [{"class_label": "dog", "contour": [[{"x": 10.0, "y": 15.5}, {"x": 20.9, "y": 50.2}, {"x": 25.9, "y": 28.4}]]}]}
{"path": "a/b/c/img2.png", "annotations": [{"class_label": "cat", "contour": [[{"x": 97.2, "y": 40.2}, {"x": 33.33, "y": 44.3}, {"x": 10.9, "y": 18.7}]]}]}

Text Classification

Text classification tasks provide a global label to a selection of text. For example, text within a dataset might say that "the economy in the USA has grown at twice the rate of that of the UK." The classification task might have the context "Is the content of this text pro-America?" In this case, an annotator could label this as positive.

{"path": "a/b/c/text1.text","annotations":[{"task_type":"Text Classification","text_classification":{"context":"Pro-America?","label":"positive"}}]}

Text Token Classification

Token classification is a more granular version of text classification that labels the characters, words, or phrases within the selection of text. Using the above example, an annotator could label the word "USA" as a "place." This is a common form of token classification called named-entity recognition, in which words are labeled.

{
"path": "a/b/c/text1.text",
"annotations": [
{
"token_classification": {
"start_position": 19,
"end_position": 22,
"label": "place"
}
}
]
}
note

For effective classification, the start and end positions should be specified at the character level, not the word level. Characters in the text string begin at index 0, which means that the character in position 19 (the "U" in "USA") would be the 20th character in the string. Characters included in the string begin at the start_position (19 in the above example) and do not include the character at the end_position. In the above example, the characters in "USA" occupy positions 19, 20, and 21.

Text Generation

Text generation is an annotation for text-based classifications that covers all forms of generated text, such as text summarization and text translation.

Following the example above, a text summarization annotation of a text file could produce a text summary of the "USA economy is growing."

{
"path": "a/b/c/text1.text",
"task_type": "Text Generation",
"text_generation": {
"context": "summary",
"generated_text": "USA economy is growing."
}
}

Running a text translation annotation to translate the above example into the French language would produce a result of "L'économie américaine en croissance."

{
"path": "a/b/c/text1.text",
"task_type": "Text Generation",
"text_generation": {
"context": "translation-english-french",
"generated_text": "L'économie américaine en croissance"
}
}

Annotation Approval Status and Metadata

All annotations might optionally specify an approval_status field with one of the three values needs_review, verified, rejected, and an metadata field related with the annotation. Those are not required for uploads, and will be in the downloaded archive of the dataset if provided.

{"path": "a/b/c/img1.png", "annotations": [{"class_label": "dog", "approval_status": "verified", "metadata": {"annotator": "alice", "confidence": 0.9}}]}

Example Image Annotation File

{
"annotations": [
{
"path": "data_folder/a.jpg",
"annotations": [
{
"contour": [
[
{
"x": 10.0,
"y": 15.5
},
{
"x": 20.9,
"y": 50.2
},
{
"x": 25.9,
"y": 28.4
}
],
[
{
"x": 60.0,
"y": 15.5
},
{
"x": 70.9,
"y": 50.2
},
{
"x": 75.9,
"y": 28.4
}
]
],
"class_label": "standing",
"approval_status": "needs_review",
"metadata": {"annotator":"bryan", "confidence":0.6}
}
]
},
{
"path": "data_folder/b.jpg",
"annotations": [
{
"bbox": {"ymin": 174.02, "xmin": 25.89, "ymax": 448.72, "xmax": 289.08},
"class_label": "other",
"approval_status": "verified"
}
]
},
{
"path": "data_folder/c.jpg",
"annotations": [
{
"oriented_bbox": {"cx": 0.52, "cy": 0.83, "w": 0.07, "h": 0.02, "r": 0.17},
"class_label": "lying",
"metadata": {"annotator":"felix", "note": "between lying and sitting"}
}
]
},
{
"path": "data_folder/d.jpg",
"annotations": [
{
"class_label": "sitting",
"approval_status": "rejected",
}
]
},
]
}

Example Text Annotation File

{
"annotations": [
{
"path": "data_folder/a.txt",
"annotations": [
{
"token_classification": {
"start_position": 0,
"end_position": 5,
"label": "animal"
},
"approval_status": "rejected",
"metadata": {"annotator":"bryan", "note":"invalid label value"}
},
{
"token_classification": {
"start_position": 16,
"end_position": 22,
"label": "noun"
},
"approval_status": "verified"
}
]
},
{
"path": "data_folder/b.txt",
"annotations": [
{
"translation": {
"origin_language": "english",
"target_language": "spanish",
"translation": "son las gatas malas"
},
"approval_status": "verified",
"metadata": {"annotator":"bryan", "note":"straight"}
}
]
},
{
"path": "data_folder/c.txt",
"annotations": [
{
"sentiment": {
"context": "Can this sport be dangerous?",
"label": "positive"
},
"approval_status": "rejected"
}
]
},
{
"path": "data_folder/d.txt",
"annotations": [
{
"summarization": {
"summary": "A notoriously bad football team lately."
}
}
]
}
]
}

Example Conversion to Chariot Dataset

A Chariot dataset may be populated with datums and annotations using an archive as discussed above, or the SDK may be used to upload datums and/or annotations. Code examples converting a COCO formatted archive (example archived may be downloaded here) to Chariot are provided below for each. Note that there are additional possible implementations, e.g., uploading one datum at a time or uploading a bulk set of annotations by themselves. Depending on the scenario, different implementations may be more appropriate; for example, a compressed archive will upload faster but will fail entirely if there are any issues such as formatting. Meanwhile, uploading individual datums and/or annotations may be slower, but a single incorrectly formatted upload will not cause the remaining properly formatted ones to fail.

import json
import numpy as np
import os
import shutil

COCO_IMG_ROOT_PATH, COCO_ANN_PATH = "./coco/val2017/val2017", "./coco/anns.json"
DST_PATH = "./coco/coco_converted_to_chariot_archive"
os.makedirs(DST_PATH, exist_ok=True)

with open(COCO_ANN_PATH, "r") as f:
coco_annotations = json.load(f)

chariot_anns = {}
for i in coco_annotations['images']:
chariot_anns[i['id']] = {"path": i["file_name"], "annotations": []}
shutil.copy(os.path.join(COCO_IMG_ROOT_PATH, i['file_name']), os.path.join(DST_PATH, i['file_name']))

cat_id_to_label = {c["id"]: c["name"] for c in coco_annotations["categories"]}
for a in coco_annotations["annotations"]:
chariot_anns[a["image_id"]]["annotations"].append(
{
"class_label": cat_id_to_label[a["category_id"]],
"bbox": {
"xmin": a["bbox"][0],
"ymin": a["bbox"][1],
"xmax": a["bbox"][0] + a["bbox"][2],
"ymax": a["bbox"][1] + a["bbox"][3],
}
}
)
if not a["iscrowd"]:
chariot_anns[a["image_id"]]["annotations"].append(
{
"class_label": cat_id_to_label[a["category_id"]],
"contour": [[{"x": x, "y": y} for x, y in zip(a['segmentation'][0][::2], a['segmentation'][0][1::2])]]
}
)
else: # In this case, COCO segmentation is a binary mask formatted as Run Length Encoding
height, width = a['segmentation']['size']
mask = np.zeros(height * width, dtype=np.uint8)

idx = 0
# Binary mask in RLE format is represented by a sequence of counts of 0's and counts of 1's
for neg_count, pos_count in zip(a['segmentation']['counts'][::2], a['segmentation']['counts'][1::2]):
mmask[idx + neg_count:idx + neg_count + pos_count] = 1
idx += (neg_count + pos_count)
mask = mask.reshape((height, width), order='F')

# Pad the mask to avoid contour artifacts
padded_mask = np.pad(mask, pad_width=1, mode='constant', constant_values=0)
contours, _ = cv2.findContours(padded_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

for contour in contours:
contour -= 1 # Remove padding offset
chariot_anns[a["image_id"]]["annotations"].append(
{
"class_label": cat_id_to_label[a["category_id"]],
"contour": [[{"x": int(vertex[0][0]), "y": int(vertex[0][1])} for vertex in contour]]
}
)

with open(os.path.join(DST_PATH, "annotations.jsonl"), "w") as f:
for v in chariot_anns.values():
f.write(json.dumps(v)+"\n")