As one watches a film or video of a person moving, one can easily estimate the 3-dimensional motions of the moving person from watching the 2-d projected images over time. A dancer could repeat the motions depicted in the film. Yet such 3-d motion is hard for a computer to estimate.
Many applications would follow from a computer with the same abilities to infer 3-d motions. There are applications to public safety for elevators and escalators, as well as in interactive games, and virtual reality. In computer graphics, a growing industry is devoted to "motion capture", where digitized human figure motion drives computer graphic characters. The human's 3-d motion information is digitized either by magnetic sensors or by optical techniques with multiple calibrated cameras and a special suit of markers. Unfortunately, either technique is expensive and cumbersome. To obtain 3-d figure motion information from single-camera video would allow motion capture driven by ordinary monocular video cameras, and could be applied to archival film or video.
As described by L. Goncalves, E. D. Bernardo, E. Ursella, and P. Peronra, Monocular tracking of the human arm in 3d., Proc. 5th Intl. Conf. on Computer Vision, pages 764-770, IEEE, 1995, under constrained viewing and motion conditions, Goncalves and collaborators, tracked the motion of an arm in 3-d. In an article by J. M. Rehg and T. Kanade entitled Model-based tracking of self-occluding articulated objects, Proc. 5th Intl. Conf. on Computer Vision, pages 612-617, IEEE, 1995, some hand motions are tracked over 3-d, allowing significant occlusions. However, this requires 3-d model initialization, and controlled viewing conditions. Work at recovering body pose from more than one camera has met with more success as discussed by D. M. Gavrila and L. S. Davis, 3-d model-based tracking of humans in action: a multi-view approach, in Proc. IEEE CVPR, pages 73-80, 1996. Despite research attention as illustrated in a book edited by I. Essa, entitled International Workshop on Automatic, Face- and Gesture- Recognition, IEEE Computer Society, Killington, Vt., 1997, the problem of recovering 3-d figure motion from single camera video has not been solved satisfactorily.