In modern society, video surveillance cameras can be found everywhere. It is easy to miss abnormal events in videos if we only rely on human eyes to observe and detect. With the rapid development of computer networks, communication and semiconductor technologies, more and more people prefer to use computer vision instead of human eyes to analyze video images obtained by sensors and acquire useful information in the images. Video tracking is one of the focuses of computer vision studies, which mainly tracks objects of interest obtained by image sensors. Video tracking is the basis of multiple video applications, such as traffic monitoring, smart robots and human-computer interaction, which plays an important role in smart city management, cracking down on illegal and criminal activities and building safe and smart cities, and is the focus and difficulty of video processing studies at present.
Studies on video tracking systems have been always focused on single-target tracking, which tracks the only object of interest in monitoring. The single-target tracking is of great significance to handling of abnormal events. However, a multi-target tracking method can provide a lot of help to regulatory authorities in multiple aspects, such as early warning prompt, surveillance and management before the abnormal events occur.
At present, the multi-target tracking mainly includes a prediction-based method, a matching-based method and a detection-based method.
The prediction-based method regards a tracking problem as a state estimation problem, which optimally estimates a state (e.g., position, color, shape, etc.) of the target in next frame by signal processing according to given data. The method mainly includes a tracking algorithm based on filtering and an algorithm based on subspace learning. Filtering-based algorithms such as Kalman filtering, mean shift filtering and particle filtering mainly learn a feature space of the target according to previous data, and then locate the target according to distribution of image blocks of a current frame in the feature space. The prediction method has the advantage of fast speed in multi-target tracking, but the state of the current frame completely depends on a tracking result of the previous frame, so the target tracking cannot be automatically performed, and tracking error situations are difficult to correct.
The matching-based algorithm regards the multi-target tracking problem as a template matching problem, which uses a template to represent a target to be tracked to find an optimal matching result in next frame. The target may be one or a group of image blocks, or global or local feature representation of a target image. Such method improves tracking performances by the process of learning while tracking, but is still difficult to achieve automatic multi-target tracking results, and is also difficult to accurately track in covered and complex environments.
The detection-based algorithm regards the tracking problem as a target detection problem, which separates a target from a background, uses obtained data for training to obtain a classifier, and automatically performs target detection on the current frame, wherein an image block with the highest score is considered as a target position. The detection-based algorithm includes an offline method and an online method. The former uses pre-training or initial data of one or more frames to learn the classifier, while the latter uses sampled data of the current frame to carry out new training on the classifier. The offline learning method has poor tracking effect on dynamically changing targets, while the online learning method is easy to cause error accumulation due to the introduction of new errors in each update, and eventually drift or even lose targets. It still needs further studies on how to automatically and quickly track multiple targets accurately, i.e., considering results of the current frame and referring to different features of the target.