Tracking people or objects over time can be achieved by first running detectors that compute probabilities of presence in individual images and then linking high probabilities of detections into complete trajectories. This can be done recursively, using dynamic programming, or using Linear Programming.
Most of these approaches focus on one kind of object, such as pedestrians or cars, and only model simple interactions, such as the fact that different instances may repel each other to avoid bumping into each other or synchronize their motions to move in groups.
Multiple target tracking has a long tradition, going back many years for applications such as radar tracking. These early approaches to data association usually relied on gating and Kalman filtering, which have later made their way into our community.
Because of their recursive nature, they are prone to errors that are difficult to recover from by using a post processing step. Particle-based approaches partially address this issue by simultaneously exploring multiple hypotheses. However, they can handle only relatively small batches of temporal frames without their state space becoming unmanageably large, and often require careful parameter setting to converge.
In recent years, techniques that optimize a global objective function over many frames have emerged as powerful alternatives. They rely on Conditional Random Fields, belief Propagation, Dynamic Programming, or Linear Programming Among the latter, some operate on graphs whose nodes can either be all the spatial locations of potential people presence, or only those where a detector has fired.
On average, these more global techniques are more robust than the earlier ones but, especially among those that focus on tracking people, do not handle complex interactions between people and other scene objects. In some techniques, the trajectories of people are assumed to be given. In others, group behavior is considered during the tracking process by including priors that account for the fact that people tend to avoid hitting each other and sometimes walk in groups.
In some techniques, there is also a mechanism for guessing where entrances and exits may be by recording where tracklets start and end. However, this is very different from having objects that may move, thereby allowing objects of a different nature to appear or disappear at varying locations. In some techniques, person-to-person and person-to-object interactions are exploited to more reliably track all of them. This approach relies on a Bayesian Network model to enforce frame-to-frame temporal coherence, and on training data to learn object types and appearances. Furthermore, this approach requires the objects to be at least occasionally visible during the interaction.