1. Field of the Invention
The present invention generally to an image tracking system, and more specifically to a system and method for determining whether image regions correspond to objects to be tracked in a scene, such as persons.
2. Description of the Related Art
Basic video tracking systems are well known in the art. The video tracking systems heretofore known lack certain functional capabilities required for generating accurate and comprehensive tracking information.
Celenk et al. in a 1988 IEEE article entitled "Moving Object Tracking Using Local Windows," disclose a simple tracking mechanism that employs frame differencing and centroid generation to track objects in a non-cluttered scene. This method is not-likely to be successful because it cannot process information from complex scenes and also cannot handle the movement of objects that split and merge.
Tsai et al. in IEEE articles, published in 1981, entitled "Estimating Three-Dimensional Motion Parameters Of A Rigid Planar Patch, and Uniqueness" and "Estimation Of Three-Dimensional Motion Parameters Of Rigid Objects With Curved Surfaces" disclose that only seven points on a rigid object are needed to uniquely find the motion parameters of the object from two images. The constraints on the seven points are provided, in order to yield a unique solution. While this method provides localized motion information, it is not a robust tracking solution.
Liao in a 1994 article entitled "Tracking Human Movements Using Finite Element Methods" discloses the use of a class of contours called Snakes with Finite Element Methods to extract and model the contour of a person as they walk through an environment. The method, though accurate, is not very efficient, and techniques for automatically initializing the algorithm must still be determined. Although this method might be used as part of a tracking system, it is not sufficiently robust to form a complete solution by itself.
Montera et al. in a 1993 SPIE article entitled "Object Tracking Through Adaptive Correlation" disclose the use of correlation templates to identify the location of objects in a scene, and track the object from frame to frame. The correlation template can adapt to changing image conditions over time. However, the object must maintain a fairly fixed, rigid form in order for correlation techniques to work, and therefore is limited in its application to the general tracking problem.
Burt et. al. in a 1989 article entitled "Object Tracking With A Moving Camera" provide a detailed, informative review of the use of "optical flow" for detection and analysis of motion. This particular technique is slow and computationally expensive.
Sethi et al. in a 1987 article entitled "Finding Trajectories Of Feature Points In A Monocular Image Sequence" describe the use of path coherence and smoothness of motion as a cost measure for corresponding feature points on an object across image sequences. The cost measure is optimized on a sequence of frames using a technique called the Greedy Algorithm, which exchanges possible correspondences in order to optimize the cost measure. It is likely to work well in scenes in which feature points are easily extracted and maintained. However, it will not work well in cases of complex objects, object occlusion, object split and merge, and poor segmentation.
Salari et al. in a 1990 article entitled "Feature Point Correspondence In The Presence Of Occlusion" expands upon the previous work of Sethi and Jain by considering objects that are occluded. Specifically, the article discloses a set of phantom points which are constructed to represent the feature points missing due to occlusion. The Greedy Algorithm is updated to handle the phantom points. This method cannot handle complex objects.
There is a need for a sophisticated, yet cost effective, tracking system that can be used in many applications. For example, it has become desirable to acquire information concerning the activity of people, for example, within a scene of a retail establishment, a bank, automatic teller machines, bank teller windows, to name a few, using data gathered from analysis of video information acquired from the scene.
It is desirable to monitor the behavior of consumers in various locations of a retail establishment in order to provide information concerning the sequence of events and decisions that a consumer makes. This information is useful in many situations, such as, to adjust the location and features of services provided in a bank, to change merchandising strategies and display arrangements; etc. Consequently, it is necessary for the system to differentiate between people in the scene and between people and other stationary and moving objects in the scene.
A video tracking system is needed which can track the movement of complex objects, such as people, through a scene which may include complex objects itself. Moreover, a video tracking system which can function on an inexpensive computation platform offers significant advantages over the tracking systems heretofore known.