The current deep learning technology combined with computer vision has been the development trend of artificial intelligence (AI). However, the deep learning network needs a large number of image annotation training samples to improve the accuracy.
At present, most of the image annotation methods are done manually. The operator needs to select the objects one by one for each image frame in the video data and key in the associated annotation. However, when there are a large number of target objects in the video data, such manual annotation method is time-consuming and labor-intensive.