The automated monitoring of the presence, location, and activities of people is a fundamental technology that enables many new, context-aware applications in domains ranging from “intelligent environments” to security and surveillance. Achievement of this via video cameras has the great appeal of not requiring any special behavior, awareness, or instrumentation of those being observed, while the cameras employed may be shared with other applications, such as teleconferencing, and may provide human observers with the means to record and verify the automated analysis. Currently, vision-based person and object perception is beset by many difficult challenges, including segmentation of people from the background, discrimination of people from other foreground objects, tracking of people through occlusions and close interactions, and modeling of the highly articulated human form.
One class of current camera-based methods for object recognition and pose recognition typically do not use explicitly computed depth data. As a result, these methods have great difficulty in separating objects from the scene background, in gauging the true physical size of the objects, and in determining accurate three-dimensional (3D) shape and orientation information about the objects. By attempting to implicitly obtain depth data, many object poses are more difficult to distinguish from each other in some camera views, and it is typically more difficult to construct recognition algorithms that are invariant to the location of the camera relative to the observed objects. Also, these methods tend to be highly error prone.
Furthermore, another class of current camera-based methods for object recognition attempts to match image data to 3D models. This class of methods relies on extensive computation based on the 3D models, attempting to fit data to these models and track parameters of these models over time. Such processes, particularly in the case of articulated, human bodies, are typically quite complex and noise sensitive, and therefore must employ extensive, often iterative calculations to avoid being highly error-prone. As a result, these methods are highly computational, requiring extensive computational resources, and are time consuming.
As described above, automated monitoring of people and objects is useful in many applications such as security and surveillance. For example, automated monitoring of customers may be relevant to retail store managers who might wish to improve the layout of their stores through a better understanding of shopper behavior. Currently, due to the shortcomings of the current classes of object recognition methods, retail stores often use employees or consultants to monitor shopper activity rather than automated monitoring. Human monitoring also has shortcomings, such as human error and the cost of employing additional personnel. Furthermore, in security applications it is typically necessary for automated monitoring to provide highly accurate and prompt analysis to provide maximum safety. However, due to the limitations of current automated monitoring methods, accuracy and/or prompt response time may not be provided, reducing the effectiveness and safety provided by current automated monitoring methods.