As known, video tracking is the process of locating a moving object (or several ones) in time using a camera (or several ones). An algorithm analyses the video frames and outputs the location, optionally in real time.
Visual tracking of multiple moving targets is a challenging problem. Independent tracking of individual bodies is a simple solution but fails in the presence of occlusions, where the disappearance of a target cannot be explained but in relationship with the other targets (the event in which the light emitted or reflected by an object is blocked by another object before it reaches the eye or camera where the image is taken, is called occlusion).
On the other hand, principled modeling of the occlusion process is possible when considering the joint configuration of all involved targets, and enables a single tracker in charge of estimating the joint dynamics of the different bodies to interpret images correctly during occlusion. This solution, however, requires a representation size that grows exponentially with the number of bodies, thus leading to an estimation algorithm whose computational complexity grows exponentially as well.
However, the problem of tracking the position and the velocity of a single target is well distinguished from the one of tracking the position of two or more different targets. Although both tasks can be formalized as a joint estimation problem, in the first case physical constraints impose a strong correlation of position and velocity, while in the second case the two components, the locations of the different objects, may depend only weakly from each other, if at all. Their measurements, however, may still be strongly correlated due to occlusions. This is the basic observation that has motivated the invention. In our method we deal with estimates separately, but analyze images jointly.
There are a number of acknowledged approaches described in the literature which address the multi-target/multi-part tracking problem.
In particular the article by M. Isard and J. MacCormick, BraMBLe, a Bayesian multiple-blob tracker, in Int. Conf. Computer Vision, 2003, appears to be a point of reference for the kind of probabilistic approach that this proposal addresses.
Other articles address similar problems, like T. Zhao and R. Nevatia, “Tracking Multiple Humans in Crowded Environment,” IEEE Conf. on Computer Vision and Pattern Recognition, 2004; or K. Otsuka and N. Mukawa, “Multiview occlusion analysis for tracking densely populated objects based on 2-D visual angles,” in Int. Conf. Computer Vision and Pattern Recognition, 2004.
The above references disclose implementing principled occlusion reasoning suffering from the problem of dimensionality resulting in heavy computational burden due to exponential complexity increase in the number of targets.