Estimating and tracking poses of a human body is useful for various applications including, among other applications, action recognition, surveillance, and man-machine interaction. Estimating and tracking an arbitrary pose from an image or a video sequence remains a challenging problem because it often involves capturing subtle nuances in human poses. The problem is complicated by the fact that there are background distractions, changes in surrounding lighting conditions, and other disturbances.
There are multiple approaches for human pose estimation and tracking based on visual images or video sequences. Some approaches are bottom-up approaches where components of the body are detected. Then, the detected components are used to infer configuration of a whole body. The bottom-up approach is problematic in that it does not accurately and reliably detect various components in a cluttered scene.
Another group of approaches uses machine learning techniques. This approach is also problematic because a large number of poses can not be addressed.
Some approaches use silhouettes of the human body to estimate and track the poses of the human body. Using silhouettes has the advantage that ambiguity present in the images is reduced. This approach, however, is problematic because details necessary for reconstructing 3D human poses may be lost.
Some of the recent developments use a stream of depth images. The depth images contain a depth profile of a contour representing the human silhouette; and therefore, more information is available for pose estimation. Iterative Closet Point (ICP) is often used with the depth images as a method for fitting 3D model to 3D data points generated from the depth images. For example, J. Ziegler et al. “Tracking of the articulated upper body on multi-view stereo image sequences,” CVPR 2006 discloses using unscented Kalman filters together with the ICP approach to reconstruct the poses of the upper human body based on a 3D data points obtained by four stereo image streams. A common issue with the ICP approaches, however, is that the model may drift away from the data or that the ICP sometimes results in local minima. To avoid drifting away or the problem of resulting in the local minima, an initial configuration is critical for the ICP. However, it is difficult to set the initial configuration appropriately for all images, especially when the changes of motions between the images are large.
In the above approaches, the computing requirements for tracking and estimating the poses may be demanding. If the tracking and estimating algorithm is too slow, then some images may have to be skipped to perform tracking and estimation within the time constraint. Skipping the images, however, is problematic because it reduces the accuracy of the tracking and estimation of the poses.
What is needed is an improved method and apparatus for estimating and tracking human poses that accurately tracks and detects various human poses. There is also a need for a method and apparatus for estimating and tracking the human poses that avoids the local minima problem. There is also a need for estimating and tracking human poses with less demanding computation to provide real time estimation and tracking of human poses.