Object identification systems have been quite popular in today's commercial and entertainment businesses. Object identification in video is a problem in computer vision that targets at locating and identifying objects (i.e., giving the exact identity) in a video sequence by a given set of images that contain the objects with known identities. For example, video object identification has been driven by its huge potential in developing applications in many domains including video surveillance security, augmented reality, automatic video tagging, medical analysis, quality control, and video-lecture assessment. Even though object identification is a relatively easy task for human brains, it is challenging for machines due to large variations in the appearance of identified objects in terms of orientation, illumination, expression and occlusion.
The object identification typically involves at least the object detection and the object recognition. For either detection or recognition, existing methods in this domain generally consist of two stages: the learning phase and the recognition phase. In the learning stage, typically a database of static images including different objects is collected as training data. Based on the specific category of objects, features with high discriminative power are extracted. These features are further combined with a certain learning schema to develop a model. In the recognition stage, the new given objects are detected and classified as a certain object by the learned model.
FIG. 1 shows a typical object identification system. As shown in FIG. 1, an object detection module is applied to the input video sequence. Then, an object tracking and recognition module is applied to the detected objects by using a database of labeled objects as training data. After tracking and recognizing process is performed, final labeled objects are outputted.
However, a common difficulty in objet recognition is that the static database used for training usually contains objects that differs greatly from the objects in testing images or video in forms of orientation, illumination, expression and occlusion, which leads to low recognition accuracy. According to disclosed embodiments, video sequence contains a large number of frames which include intrinsic spatio-temporal information that could be used to extract hint information to help object identification. Effectively extracting useful and compact information from video as a hint to help with object identification is a challenging problem which has not been deeply explored.
The disclosed methods and systems are directed to solve one or more problems set forth above and other problems.