Detection and tracking of a person or other object of interest is an important aspect of video-camera-based systems such as video conferencing systems, video surveillance and monitoring systems, and human-machine interfaces. For example, in a video conferencing system, it is often desirable to frame the head and shoulders of a particular conference participant in the resultant output video signal, while in a video surveillance system, it may be desirable to frame the entire body of, e.g., a person entering or leaving a restricted area monitored by the system.
Conventional techniques for detecting persons in the above-noted applications include background subtraction, face detection and skin tone detection. A significant problem with these and other conventional detection techniques is that many of them use general models or scene assumptions that make the techniques difficult to adapt for use with typical home or office scenes. For example, since moving persons in such home or office scenes often interact with objects such as chairs, sofas, and items on tables, a background subtraction technique would need to either keep track of a large number of foreground objects or update the background model frequently. In addition, face detection and skin tone detection techniques usually only handle only a limited number of head poses, e.g., faces close to the camera. Another problem with these techniques is that the models used may be unable to accurately account for varying lighting conditions. Moreover, given the degree of clutter in typical home or office scenes, these and other conventional techniques that make use of edge detection are generally not suitable for real-time implementation.
Other detection techniques make use of motion as a visual cue for scene analysis. Motion-based detection is particularly useful for cluttered scenes in which frequent movements of people and objects are common, such as the above-noted typical home or office environments. One known motion-based detection technique uses optical flow fields to group together as an object connected pixels undergoing similar motion. Image differencing techniques utilize the differences between consecutive frames to locate moving objects in a scene.
Such techniques are generally well suited for use in applications in which the number of moving objects in the scene is small and the interactions between them are limited. Although image differencing techniques can in some cases be adaptive to dynamic environments, such techniques nonetheless generally do a poor job of extracting all relevant feature pixels. Another significant problem with image differencing and other conventional motion-based detection techniques is that it can be hard to separate multiple moving objects occupying the same image region.
An example of a conventional motion-based detection technique is described in European Patent Application No. 635983 A2, entitled “Method and Means for Detecting People in Image Sequences.” This technique starts with analyzing a difference image, obtained by subtracting a current frame from a previous frame, and tries to locate the head positions of people in the image. However, this technique relies on the computation of continuous curve and curvature extrema, and is therefore sensitive to noise and is computationally expensive.
As is apparent from the above, a need exists for improved techniques for detecting persons in image processing systems such as video conferencing systems, video surveillance and monitoring systems, and human-machine interfaces.