Tracking articulated human motion is of interest in numerous applications including video surveillance, gesture analysis, human computer interface, and computer animation. For example, in creating a sports video game it may be desirable to track the three-dimensional (3D) motions of an athlete in order to realistically animate the game's characters. In biomedical applications, 3D motion tracking is important in analyzing and solving problems relating to the movement of human joints. In traditional 3D motion tracking, subjects wear suits with special markers and perform motions recorded by complex 3D capture systems. However, such motion capture systems are expensive due to the required special equipment and significant studio time. Further, conventional 3D motion capture systems require considerable post-processing work which adds to the time and cost associated with traditional 3D tracking methods.
Various tracking algorithms have been proposed that require neither special clothing nor markers. A number of algorithms track body motion in the two-dimensional (2D) image plane, thereby avoiding the need for complex 3D models or camera calibration information. However, many conventional methods are only able to infer 2D joint locations and angles. As a result, many traditional 2D methods have difficulty in handling occlusions and are inutile for applications where accurate 3D information is required.
3D tracking algorithms based on 2D image sequences have been proposed but depend on detailed 3D articulated models requiring significantly more degrees of freedom. Particularly, particle filtering methods have been applied widely in tracking applications. However, these algorithms have conventionally been inefficient due to the high dimensionality of the pose state space. The number of particles needed to sufficiently approximate the state posterior distribution means that significant memory and processing power is required for implementation.
Several attempts have previously been made to develop particle filtering techniques in a reduced state space to ease memory and processing requirements. These efforts have largely failed to result in accurate tracking methods. Specifically, the proposed algorithms tend to fail when large limb movements occur over time.
What is needed is an efficient and accurate algorithm for tracking 3D articulated human motion given monocular video sequences.