Training machine learning models to recognize and distinguish particular objects from each other, for example, to recognize that an object is a car or a person, requires a large number of examples, each example depicting a car or a person. To generate large data sets required for training and building machine learning models, existing techniques have often required human operators to manually annotate objects in each frame of a video. While typically accurate, this process of manually annotating each object in a video on a frame by frame basis is laborious, time-consuming, and costly. Using manual processes, the cost and time required to annotate individual frames in video is prohibitive and makes artificial intelligence (AI) applications that need to understand objects moving through time and space untenable.
Accordingly, it would be advantageous to provide a solution to the problem of annotating large volumes of images to generate large amounts of data for training machine learning models for various applications that improves throughput and efficiency without sacrificing accuracy. In particular, a technique for annotating an object in a video and automatically tracking the annotated object through subsequent frames of the video to provide accurate annotation of objects that can be used as training data for machine learning models, but with increased throughput and reduced cost as compared with manual annotation, would provide advantages and benefits over existing techniques. Moreover, to ensure accurate annotations of video content, it is critical to incorporate human feedback into a technique that provides an automated annotated object tracking capability.