There are many applications in computer graphics, athletic performance analysis, and user-interfaces for 3-D figure and hand motion tracking technology. In motion capture for computer graphics, human motion is tracked in 3-D using a kinematic model. The kinematic model can then imbue the graphical models with realistic motion and dynamics.
Having 3-D tracking output is critical in this application as it may be necessary to view the motion from any direction. Similarly, in sports applications the ability to track an athlete's body motion during a complex task is an important tool for diagnosing medical problems and improving task execution.
In current figure tracking systems, measurements of figure motion are used to estimate state parameters such as joint angles in a previously-specified kinematic model. There are a wide variety of measurements that have been employed, including optical, magnetic, and sonar features. The optical approach embraces both target-based systems in which the user wears a special suit covered with retro-reflective targets and non-invasive systems which employ image features extracted from an ordinary video sequence. In the magnetic and sonar approaches, targets attached to each link of a person's body can be tracked in 3-D using special emitter-receiver hardware.
What all of the above approaches have in common is the use of a previously specified kinematic model in order to correctly register the measurement data. This model must include, among other things, the distances between the joint centers, the 3-D locations where kinematic chains such as arms and legs attach to the torso, and the orientation of the joint axes. This is a significant amount of information which can be difficult to obtain accurately.
There has been a great deal of work on 3-D human body tracking using 3-D kinematic models. Most of these 3-D models employ gradient-based estimation schemes, and, therefore, are vulnerable to the effects of kinematic singularities. Methods that do not use gradient techniques usually employ an ad-hoc generate-and-test strategy to search through state space. The high dimensionality of the state space for an articulated figure makes these methods dramatically slower than gradient-based techniques that use the local error surface gradient to quickly identify good search directions. As a result, generate-and-test strategies are not a compelling option for practical applications, for example, applications that demand results in real time.
Gradient-based 3-D tracking methods exhibit poor performance in the vicinity of kinematic singularities. This effect can be illustrated using a simple one link object 100 depicted in FIG. 1a. There, the link 100 has one DOF due to joint 101 movably fixed to some arbitrary base. The joint 101 has an axis of rotation perpendicular to the plane of FIG. 1a. The joint 101 allows the object 100 to rotate by the angle .theta. in the plane of the Figure.
Consider a point feature 102 at the distal end of the link 100. As the angle .theta. varies, the feature 102 will trace out a circle in the image plane, and any instantaneous changes in state will produce an immediate change in the position of the feature 102. Another way to state this is that the velocity vector for the feature 102, V.sub..theta., is never parallel to the viewing direction, which in this case is perpendicular to the page.
In FIG. 1b, the object 100 has an additional DOF. The extra DOF is provided by a mechanism that allows the plane in which the point feature 102 travels to "tilt" relative to the plane of the page. The Cartesian position (x, y) of the point feature 102 is a function of the two state variables .theta. and .phi. given by: EQU x=cos (.phi.) sin (.theta.), y=cos (.theta.).
This is simply a spherical coordinate system of unit radius with the camera viewpoint along the z axis.
The partial derivative (velocity) of any point feature position with respect to the state, also called the "Jacobian," can be expressed as: ##EQU1##
Singularities arise when the Jacobian matrix J loses rank. In this case, rank is lost when either sin(.phi.) or sin(.theta.) is equal to zero. In both cases, J.sub.singq dq=0 for state changes dq=[1 0].sup.T, implying that changes in .phi. cannot be recovered from point feature measurements in this configurations.
Singularities impact visual tracking by their effect on state estimation using error minimization. Consider tracking the object 100 of FIG. 1b using the well known Levenberg-Marquardt update step: EQU q.sub.k =q.sub.k-1 +dq.sub.k =q.sub.k-1 -(J.sup.T J+.LAMBDA.).sup.-1 J.sup.T R,
where .LAMBDA. is a stabilizing matrix with diagonal entries. See Dennis et al., "Numerical Methods for Unconstrained Optimization and Nonlinear Equations," Prentice-Hall, Englewood Cliffs, N.J., 1983 for details.
At the singularity sin(.phi.)=0, the update step for all trajectories has the form dq=[0 C], implying that no updates to .phi. will occur regardless of the measured motion of the point feature 102. This singularity occurs, for example, when the link rotates through a plane parallel to the image plane, resulting in a point velocity V.sub..phi. which is parallel to the camera or viewing axis.
FIG. 2 graphically illustrates the practical implications of singularities on tracker performance. In FIG. 2, the x-axis plots iterations, and the y-axis plots the angle .phi. in terms of radians. The stair-stepped solid line 201 corresponds to discrete steps in .phi. of a simulation of the two DOF object 100 of FIG. 1b. The solid line 201 shows the state estimates produced by the update equation as a function of the number of iterations of the solver.
The increased "damping" in the estimator, shown by the dotted line 202, as the trajectory approaches the point when .phi.=0 is symptomatic of tracking near singularities. In this example, the singular state was never reached. In fact, at point 204, the tracker makes a serious error and continues in a downward direction opposite the true motion as a consequence of the usual reflective ambiguity under orthographic projection. This is shown by the dashed line 203. A correct tracker would follow the upward portion of the solid line 201.
In addition to singularity problems, tracking with 3-D kinematic models also requires the 3-D geometry of the object to be known in advance, particularly the lengths of the links. In order to track a particular person, the figure model must first be tuned so that the arms, legs, and torso have the correct dimensions. This can be non-trivial in practice, due to the difficulty of measuring the exact locations of the joint centers in the images.
In one prior method, a two stage tracking technique is used to track hand gestures. See Shimada et al. in "3-D Hand Pose Estimation and Shape Model Refinement from a Monocular Image Sequence," Intl. Conf. on Virtual Systems and Multimedia, pp. 423-428, Gifu, Japan, Sep. 18, 1996, and Shimada et al. in "Hand Gesture Recognition Using Computer Vision Based on Model-Matching Method," Sixth Intl. Conf. on Human-Computer Interaction, Yokohama, Japan, Jul. 9, 1995.
In their first stage, hands are tracked using a crude 3-D estimate of hand motion that is obtained by matching to extracted silhouettes. In their second stage, model parameters are adapted using an Extended Kalman Filter (EKF).
The first stage of their sampling is based on adaptive sampling of the state space, and requires a full 3-D model. This limits the method to situations where complete 3-D kinematic models are available. Furthermore, the adaptive sampling is dependent on the dimensions of the links, and requires separate models for hands of varying sizes.
The second stage adapts a previously specified 3-D kinematic model to a particular individual. This requires fairly close agreement between the original model and the subject, or else the EKF may fail to converge.
Another method is described by Ju et al. in "Cardboard people: A Parameterized Model of Articulated Image Motion," Intl. Conf. Automatic Face and Gesture Recognition, pp. 38-44, Killington, Vt., 1996. There, each link is tracked with a separate template model, and adjacent templates are joined through point constraints. The method is not explicitly connected to any 3-D kinematic model, and, consequently, does not support 3-D reconstruction. In addition, the method requires a fairly large number of parameters which may degrades performance because noise is more likely to be introduced.
Therefore, there is a need for a tracking method that can estimate the motion of a 3-D figure without exactly knowing the exact initial configuration of the figure.