Annotation Best Practices
Accurate data annotations are critical to building reliable machine learning (ML) models with supervised training. If annotations are not accurate, it can degrade model performance and lead to incorrect metrics.
This guide provides best practices for creating accurate and consistent annotations as efficiently as possible.
Different types of ML models require different types of annotations. For best practices for partial datum annotations, see the Partial Datum section of this guide.*
Create an Annotation Guide
Before you start annotating, create an annotation guide tailored to the scope and complexity of the annotation task. This will help to ensure alignment and consistency across your team in terms of the classes you’ll be labeling, how to treat partially obscured objects, and other important considerations. Spending time on this guide up front will set up you and your fellow annotators to make good decisions in the future.
Your annotation guide should include the sections outlined below.
Description of the Use Case
Include a detailed description of the use case for the ML model. This will help inform decisions about the target level of accuracy and how to handle edge cases and ambiguities.
Be sure to include details about:
- Whether the model’s predictions will be used by a human or an automated system
- How the ML model’s predictions will be contributing value to the customer
- The difference in costs of false positive and false negative predictions made by the model
Description of the Data
Include a description of the data to be annotated: the type of data, how it was collected, and relevant details like image resolution.
Defined Labels
List the labels that you will be using for your annotations, and include detailed definitions and examples for each label. This information should mostly come from the use case and customer requirements.
Label definitions should make it clear how to know which label is the correct one. Label names should be helpful, short-hand forms of the definitions.
Use examples to show the expected variation in the data for each label. Examples can be expanded upon as annotators find new variations in the data.
If the annotation tool doesn’t support a way to flag invalid or unusable datums, include a label for that scenario.
Ideally, labels should be mutually exclusive, but in some cases, hierarchical labels may be necessary. When hierarchical labels are used, be sure that the structure is clear in the annotation guide. An example of hierarchical labels for classifying images of ships would be a “Military” label with sub-labels “Aircraft Carrier,” “Submarine,” and “Surface Combatant.” The “Military” label would be used for any military ship that does not fit one of the sub-labels, like support ships and personnel carriers. The “Surface Combatant” label could also have sub-labels such as “Destroyer,” “Frigate,” and “Corvette.”
In some cases, finer-grained labels than what the use case calls for can be useful. For example, if the task involves detecting cats in images, and specific labels like “house cat,” “lynx,” “tiger,” and “lion,” are used during annotation, these labels can be automatically changed to “cat” for model training. Then, model performance metrics can be computed for the more specific categories to see what types of cats the model does not perform well on. Ultimately, consider what might be valuable for the use case beyond the immediate prediction problem.
Note that changing the label set during annotation can be costly, because existing annotations might need to be revisited. But sometimes it’s necessary, particularly if there are unanticipated cases in the data that the labels don’t cover.
Guidance on Edge Cases
Include guidance on how annotators should handle edge cases, ambiguities, and unusable datums, along with examples.
First, describe some expected cases when the data might not be sufficient to be certain about the label. These can be due to a natural part of the data (e.g., occluded objects in object detection) or due to low data quality (e.g., low-resolution or corrupted images or misspellings in text).
Then describe what annotators should do in these situations, using the details of the use case to inform your decisions. Some options are:
- Add metadata to the datum to indicate that the correct annotation is not known.
- Use a special label, if the annotation platform does not support metadata.
- Choose a particular label from the label set, using your best judgment.
- Err on the side of a particular label, if the ambiguity is between two or more specific labels.
Decide how your team will discuss and track these cases. Some options are:
- Discuss them in the chat channel
- Track and record them in a shared document
Note that these datums may need to be excluded from training, validation, and testing splits.
Accuracy Targets
Include the target level of accuracy for the annotations. The target level of accuracy should consider the costs of incorrect predictions made by the model and the annotation resources available.
Perfect accuracy is often not necessary for creating ML models that deliver value. In fact, perfect accuracy in annotations can be extremely costly, and practices for achieving this require multiple experts independently annotating the entire dataset1. Instead, strive for consistency.
Quality Control Plan
Determine how accuracy will be measured when initial annotations are completed. Some options are:
- Review a random sample of the annotations and record any errors.
- Annotate a random sample of the data again from scratch, and compare your results to the existing annotations.
If the measured accuracy doesn’t meet your targets, locate and correct the errors. Options for locating errors include:
- Using errors discovered while measuring accuracy
- Reviewing a random sample of datums not included in the accuracy measurement
- Training a model with the annotated data, running predictions with that model on the annotated data, and reviewing the mistakes made in the predictions, which are likely to be annotation errors
Specify whether the final decision on corrections will be made by a single expert or a committee of annotators.
Description of the Annotation Tool
Include a description of the annotation tool you will be using. Include links to the tool’s user guide and any other relevant resources.
If existing ML models will be used to provide annotation hints, specify which models.
Datum Metadata
Include keys and values to use for metadata associated with datums and annotations, if the annotation platform supports them. Metadata can provide valuable contextual information to help understand dataset characteristics and enable detailed performance analysis.
Examples of datum metadata for imagery are location information, image quality, and lighting conditions.
Supplemental Information
If necessary and available, provide links to supplemental sources of information. This can include:
- For overhead (satellite or aerial) imagery data, higher-resolution images of the area, ideally at the same time, that show more detail
- For imagery data, other images of the broader area that show context or images from different perspectives that provide more information
- Reference material about the items of interest, such as technical documents that describe terms in scientific text data or Wikipedia pages about military assets to be detected in imagery
Extra Guidance for Sub-Datum Annotations
Some annotation tasks involve assigning a label to an entire datum, like the type of animal an image contains or the topic of a document. Other annotation tasks involve identifying subsets of datums and assigning labels to those, such as drawing labeled regions around objects of interest in images (object detection and image segmentation) or identifying sentences in a document that express opinions. For these other types of tasks that involve sub-datum annotation, there can also be ambiguity about whether and where an item of interest is present in a datum. Give clear guidance about what to do in these situations, informed by the details of the use case.
For object detection and image segmentation, clearly describe when to annotate partially visible objects (whether they are occluded by another object or on the edge of the image). For example, your guidance might be to only annotate partial objects that are at least 50% visible, or it might be to annotate partial objects only when you are unambiguously certain of the object’s correct label.
If the annotation platform supports it, include a “None” label to indicate that there are no items of interest in the datum to be annotated. This allows these datums to be distinguished from datums that have not yet been annotated, which should be excluded from model training and evaluation. Some ML platforms assume that datums without annotations contain no items of interest. Chariot uses a negative sample label to indicate a datum has been reviewed and contains no items of interest.
Use annotation metadata, if your annotation tool supports them, to record when the annotated data contains characteristics of interest. For example, metadata can be used in image data annotations to indicate objects that are partially visible or affected by image corruption. These can provide valuable information about how well models perform under different conditions in the data.
Best Practices Checklist
Before Annotating
- Read the annotation guide.
- Ask for clarification if something is unclear.
- Get to know your annotation tool.
- Read the user guide.
- Understand the pitfalls and common mistakes.
- Be aware of any limitations the tools has on your ability to go back and change annotations
- Familiarize yourself with the data.
- Look at several datums, including those that have already been annotated, if available.
- Look at examples of edge cases, if available, and look at any ambiguities in the data.
- Know where to go for help, such as an expert on the annotation task, the project lead, or other annotators.
- Set up an ergonomic workspace. Make sure that it’s comfortable and as distraction-free as possible.
While Annotating
- Efficiency is important, but don’t go too fast. Check that your annotations are correct before moving on to the next datum.
- Use the annotation tool’s shortcuts and hotkeys for efficiency.
- If you’re using an existing ML model to provide annotation hints, be sure that you are using the latest and best model. The annotation guide should indicate which model to use.
- Be sure to base your annotations only on the information in the datum; supplemental information referred to by the annotation guide should only be used to clarify information that is present in each datum. For example, when working with images from full-motion video, don’t let your knowledge of previous or subsequent frames influence your annotations. Only use things visible within the image to determine which label to use, and when objects are present for object detection and image segmentation tasks.
- Raise ambiguous or difficult examples that are not covered by the annotation guide.
- Record these cases and any decisions about how to handle them in the agreed-upon location (annotation guide, chat channel, other shared document, etc.).
- Be mindful of your energy level and ability to focus. Take short breaks when needed.
- For sub-datum annotations:
- Include all visible parts of the object.
- Only include visible parts when annotating partial detections.
- Be careful not to focus on the center of the image.
- Use image zoom and annotation refinement features to get the bounding geometry to align closely to the edges of the object. Annotations do not have to be pixel-perfect, but loose bounding annotations will degrade model performance.2
- Do not duplicate annotations.
- Make sure all items of interest in each datum are annotated.