Technical Field
The present invention relates to video processing and more particularly to efficient video annotation with optical flow based estimation and suggestion.
Description of the Related Art
Recent approaches in object annotation are primarily implemented on a web-based system in order to crowd-source the annotation jobs. For instance, the popular ImageNet dataset is annotated using a web-based annotation system that is deployed over the Amazon Mechanical Turk® where many workers around the world can contribute to the annotation process. The MIT Labelme dataset is also annotated using a similar web-based annotation system. However, both of the systems are focused on single image annotation where no temporal consistency or motion information is available. On the other hand, the Video Annotation Tool from Irvine, Calif. (VATIC) focuses on the problem of bounding box annotation in videos. The system is implemented using a GUI based annotation tool where workers can specify the target object type, draw boxes to annotate an object, and provide properties of the object if it is requested. In order to expedite the annotation process, the tool provides automatic box annotations in between manually annotated frames by using a simple linear interpolation. For example, if an object 1 has a manual box annotation at time frame 1 and time frame 10, all the boxes in frames from 2 to 9 are automatically generated by a linear interpolation process. However, VATIC at least suffers from being overly complex in its use of linear interpolation.
Thus, there is a need for an efficient video annotation system.