Many applications in computer vision involve motion analysis and modeling, such as motion tracking and action recognition. Most conventional methods for motion modeling are largely limited to simple motions. A comprehensive analytical model for complex motions, such as biological motion or human motion, is a challenging problem. One of the difficulties in motion modeling stems from the high dimensionality of the complex motion, which demands great descriptive power from the model itself. Without any constraint, it is very difficult, if not impossible, to model arbitrary motions. Fortunately, in practice, the motions of interest are more or less constrained due to physical or biological reasons. Although these constraints can be highly nonlinear, they largely reduce the intrinsic complexity of the motion. For example, human motions cannot be arbitrary but must be confined by anthropologically feasible joint angles, e.g., the upper arm and the lower arm cannot move independently.
Thus, one issue in motion tracking is to characterize and take advantage of these constraints. Since it is generally difficult to explicitly describe motion constraints, a plausible alternative is to learn them from training data. Human motion, although complex, resides in a space whose dimensionality is significantly lower than its joint angle space. Thus, dimensionality reduction is a significant step of learning to help reduce the problem complexity and build a motion model.
Many conventional techniques are available for dimensionality reduction in human motion tracking. One conventional technique is to reduce the dimensionality using ISOMAPS and learn a Gaussian mixture model in the low-dimensional space as described in Tenenbaum, J. B., et al., A Global Geometric Framework For Nonlinear Dimensionality Reduction, Science, 2000, vol. 290 pp. 2319-2323 which is incorporated by reference herein in its entirety. Another conventional technique is to use Laplacian eigenmaps for dimensionality reduction, and employ continuity interpolation when modeling dynamics as described in Sminchisescu, C., and A. Jepson, Generative Modeling for Continuous Non-Linearity Embedded Visual Inference, ICML, 2004 which is incorporated by reference herein in its entirety. In yet another conventional technique, K-means clustering is used to partition the state space first, and then Principal Component Analysis (PCA) is used to reduce the dimensionality.
These conventional methods are suitable when the motion is short, uniform and continuous, but are inappropriate for recognizing and tracking different motion patterns. These techniques may introduce confusion among different motion classes due to the compactness in the low-dimensional space and prevent accurate tracking.
There have been several previous attempts to deal with training data comprising multiple classes of motion. For example, a transition probability matrix may be learned as described in Wang, Q., et al., Learning Object Intrinsic Structure for Robust Visual Tracking, Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2003, pp. 227-233 and North, B., et al.,Learning and Classification of Complex Dynamics,IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, pp. 1016-1034 which are both incorporated by reference herein in their entirety. An alternative to the transition matrix is to apply the training algorithm repeatedly for each individual motion. A problem that confronts these two methods is that it is possible that the different motions to be tracked may have some similar human poses, and these poses may be even closer when the dimensionality of the data is reduced. When the tracker is approaching these confusion areas caused by these similar poses, the tracker can be distracted since there is no discriminance enforced among those possible motion patterns. When motion segments with different characteristics are intermingled together, the accuracy of motion modeling may deteriorate.
Due to the problems above, the preservation of differences between motion patterns in the training set is a significant property when tracking multiple classes of motion. Therefore, it is also preferable to maintain the discriminance between motions in the lower dimensional space. Traditional discriminative models such as Linear Discriminative Analysis (LDA) are inappropriate for this problem because the motions to be modeled are generally non-linear and non-Gaussian.
What is needed as a system for discriminative motion modeling that can recognize and track a variety of human motion patterns in a reduced dimensionality space.