Along with the arrival of the 4G era, information delivery methods of mobile terminals are not only limited to texts and images, but more generally come from video. Many internet companies have also launched related application interfaces, and the techniques for acquiring the video information has become one of the latest research hotspots.
The existing multi-target tracking techniques are mostly applied to the radar and aviation fields, including a space tracking method and a time-space tracking method. The space tracking method includes processing each frame of an image signal individually, and tracking a moving target by using a feature of target signal in a two-dimensional space. The time-space tracking method involves utilizing the feature of the target in a space domain and a motion feature thereof in a time domain, which is divided into contrast tracking and image-related tracking. Other techniques include a method based on a particle filter, a method based on mean shift, and the like.
The existing multiple-target tracking techniques usually have a single applicable scenario, and a single tracking target type. The reason is because on the one hand, the existing multi-class classifier has a low classification precision and a complex classification algorithm such as a deep neural networks (DNN) cannot be used in consideration of the operation efficiency; and on the other hand, the multiple-target tracking needs to not only deal with distinguishing between the target and the background, but also distinguishing between the targets.
With respect to the target tracking algorithm, with respect to a simple single target, the existing OpenTLD achieves a stable effect and its source code is opened; however, it is only directed to the single target. The existing solution (which establishes a universal multi-class target model through DNN, performs the multiple-target detection upon the start of the first frame of the video, gives the position of each target, and then utilizes the conventional target tracking method to implement the tracking) consumes a large amount of calculation during the multiple-target detection, and needs to train a tremendous model off line, which leads to a huge consumption in calculation and storage, and hardly meets the requirement of real-time application in the video.