1. Field of Art
The invention generally relates to computer vision, and more specifically, to fast human pose estimation for motion tracking.
2. Description of the Related Art
An important problem in modern computer vision is full body tracking of humans in video sequences. Applications for human tracking including video surveillance, gesture analysis, human computer interface, and computer animation. For example, in creating a sports video game it may be desirable to track the three-dimensional (3D) motions of an athlete in order to realistically animate the game's characters. In biomedical applications, 3D motion tracking is important in analyzing and solving problems relating to the movement of human joints. In traditional 3D motion tracking, subjects wear suits with special markers and perform motions recorded by complex 3D capture systems. However, such motion capture systems are expensive due to the special equipment and significant studio time required. Furthermore, conventional 3D motion capture systems require considerable post-processing work which adds to the time and cost associated with traditional 3D tracking methods.
There have been significant efforts to solve the problem of tracking 3D human motion from a 2D input image sequence without the need for special markers on the subject or special motion capture equipment. However, the problem presents considerable challenges for several reasons. First, there exist multiple plausible solutions to any given input since 3D pose information is being extrapolated from 2D images. This is especially true in the presence of partial occlusions. Second, humans are articulated objects with a significant number of parts whose shape and appearance change in the images due to various nuisance factors such as illumination, clothing, viewpoint and pose. Third, the space of admissible solutions (i.e., all possible positions and orientations of all body parts) is extremely large, and the search for the optimal configuration in this space is a combinatorial problem that uses significant computational power to solve directly.
Due to the significant challenges presented by the human tracking problem, conventional trackers are inherently imperfect and conditions will exist where the tracker either provides an inaccurate estimate or loses track altogether. This is particularly true for fast motions, where the body limbs undergo large displacements from one frame to the next. In order to re-initialize the tracker when tracking is lost, a pose estimator is typically used to provide the tracker with an initial pose configuration from which tracking can begin.
However, estimating pose from a single image without any prior knowledge is in itself a challenging problem. In previous work, the problem has been cast as deterministic optimization, as inference over a generative model, as segmentation and grouping of image regions, or as a sampling problem. Previously proposed solutions either assume very restrictive appearance models or make use of cues, such as skin color and face position, which are not reliable and can be found only in specific classes of images (e.g. sport players or athletes). A large body of work in pose estimation focuses on the simpler problem of estimating the 3D pose from human body silhouettes. These approaches attempt to learn a map from silhouettes to poses, either direct, one-to-many or as a probabilistic mixture.
However, the conventional solutions each fail to provide a pose estimator that is both sufficiently accurate and sufficiently fast to be used effectively in real-time human tracking. Furthermore, conventional pose estimators fail to take advantage of both appearance and motion information provided by the input image sequence. Therefore, what is needed is an improved system and method for fast pose estimation using appearance and motion features.