Tracking is a key component for many areas of real-time computer vision such as Human-Computer Interaction (“HCI”). One example of an HCI application is driver monitoring. In this area, work has been done to determine head pose and body lean using various techniques. This information can be used for example to assist an air-bag deployment system, or for attention and fatigue monitoring in a safety system. Tracking spatial location of the driver's head or other body parts facilitates the operation of these and other HCI systems.
Conventional tracking systems are typically based on two-dimensional (2D) gray or color images. There are situations in which methods using templates or probabilistic frameworks do not function very robustly. Conventional tracking algorithms often fail when the environment is cluttered. The algorithms fail because the hypothesis being examined often cannot distinguish the real target. Distractions by edges caused by non-target environmental objects are major contributors to this problem even though the target object may have distinct depth difference compared to its surroundings (e.g., background). For example, this effect is common in many important tracking applications including head tracking, human tracking, and hand gesture recognition. Accordingly, using conventional systems, distinguishing a target object in the foreground from other objects in the background is not a trivial task.
This problem common to conventional tracking methods is basically that the target tends to get lost when the environment has a cluttered background. A similar problem arises when changing lighting conditions distract contours or patterns that tracking algorithms are based on. There have been many different approaches to solving the tracking problems. Some conventional systems use contour information, while others use depth from stereo imaging systems, intensity and color distribution, or a combination of these features.
Some systems have attempted to use depth characteristics of the target to aid in the tracking functions. For example, stereo (dual camera) systems are used to track heads with a model-fitting approach. Some of these systems use stereoscopic images, but still rely on other intensity-based information, or if using stereoscopic images only, it is used with a computationally intensive algorithm.
Thus, there is a need for tracking methods and systems that are based on (1) real-time image data, (2) using algorithms that are not computationally intensive, and (3) using simple single-camera systems.