Video object segmentation is a technique for detecting and segmenting an object region of a given semantic category from the video, which is a fundamental technique in the field of computer vision and multimedia analysis, and plays an important role in application aspects such as object retrieval, video editing and video-based three-dimensional modeling. A weakly-labeled method for segmenting video objects refers to a method in which only a semantic category of the video object is labeled by a user, and an object belonging to a category specified by the user is detected and segmented with an algorithm. Considering that most of the Internet videos usually have a user flag, and a semantic tag relevant to the content, therefore, the weakly-labeled method for segmenting video objects is of great importance in the application of analyzing and processing ever-increasing internet video data.
Since a weakly-labeled video object has a feature that, it is only known about that there is an object belonging to a specified semantic category in the input video, but the specific location information is still unknown. Currently, the most widely adopted solution is based on weakly supervised learning, which particularly is: firstly, positive videos and negative videos are collected, wherein, the positive videos contain the objects from a pre-specified semantic category while the negative videos do not contain any objects from this category; secondly, each video is segmented into spatio-temporal segments, and a semantic category of each spatio-temporal segment is determined according to the association between the positive videos and the negative videos; finally, the positive videos and the negative videos are all co-segmented with a multi-graph optimization model to obtain a segmentation result, that is, the objects belonging to the specified semantic category in the input video.
The above method for segmenting video objects based on weakly-supervised learning can effectively solve the problem of segmenting a video object under a weakly-labeled condition in some cases, however, since that a weakly-labeled video lacks location information of an object of a specific semantic category, which makes the classification of the positive sample video and the negative sample video inaccurate, and thus a wrong result of video segmentation during segmentation is most likely to occur, in addition, multiple videos are simultaneously needed as the input for segmentation, which renders these methods not applicable for segmenting a semantic category object of a single input video.