Gesture recognition is receiving more and more attention due to its potential use in sign language recognition, multimodal human computer interaction, virtual reality and robot control. Most gesture recognition methods match observed sequences of input images with training samples or a model. The input sequence is classified as the gesture class whose samples or model matches it best. Dynamic Time Warping (DTW), Continuous Dynamic Programming (CDP), Hidden Markov Model (HMM) and Conditional Random Field (CRF) are examples of gesture classifiers.
HMM matching is the most widely used technique for gesture recognition. However, this kind of method cannot utilize geometrical information of a hand's trajectory, which has proven effective for gesture recognition. In previous methods utilizing hand trajectory, the hand trajectory is taken as a whole, and some geometrical features which reflect the shape of the trajectory, such as the mean hand's position in the x and y axis, the skewness of x and y positions of the observed hands, and so on, are extracted as the input of the Bayesian classifier for recognition. However, this method cannot describe the hand gesture precisely.
For online gesture recognition, gesture spotting, i.e., determining the start and end points of the gesture, is a very important but difficult task. There are two types of approaches for gesture spotting: the direct approach and the indirect approach. In direct approaches, motion parameters, such as velocity, acceleration and trajectory curvature, are first computed, and abrupt changes of these parameters are found to identify candidate gesture boundaries. However, these methods are not accurate enough. The indirect approaches combine gesture spotting and gesture recognition. For the input sequence, the indirect approaches find intervals that give high recognition scores when matched with training samples or models, thus achieving temporal segmentation and recognition of gestures at the same time. However, these methods are usually time-consuming, and also some false detection of gestures may occur. One conventional approach proposes to use a pruning strategy to improve the accuracy as well as speed of the system. However, the method simply prunes based on the compatibility between a single point of the hand trajectory and a single model state. If the likelihood of the current observation is below a threshold, the match hypothesis will be pruned. The pruning classifier based on this simple strategy may easily over fit the training data.
Therefore, a need exists for techniques for more accurate and robust gesture spotting and recognition.
Furthermore, different users' gestures usually differ in speed, starting and ending points, angles of turning points and so on. Therefore, it's very meaningful to study how to adjust the classifiers to make a recognition system adapt to specific users.
Previously, only a few researchers have studied adaptive gesture recognition. One prior art technique achieves the adaptation of a gesture system through retraining the HMM models with new samples. However, this method loses the information of previous samples and is sensitive to noise data. Another technique uses an online version of the Baum-Welch method to realize online learning and updating of gesture classifiers, and develops a system that can learn a simple gesture online. However, the updating speed of this method is very slow.
Although there are only a few studies on adaptive gesture recognition, many methods for adaptive speech recognition have been published. One such study updates the HMM model through maximum a posteriori (MAP) parameter estimation. Through the use of prior distributions of parameters, less new data is needed to get robust parameter estimation and updating. The drawback of this method is that the new samples can only update the HMM model of its corresponding class, thus decreasing the updating speed. Maximum likelihood linear regression (MLLR) is widely used for adaptive speech recognition. It estimates a set of linear transformations of the model parameters using new samples, so that the model can better match the new samples after transformation. All model parameters can share a global linear transformation, or cluster into different groups, where each group of parameters share a same linear transformation. MLLR can overcome the drawback of MAP, and improve the model updating speed.
Therefore, a need exists for techniques to achieve adaptive gesture recognition so that a system employing such techniques can adapt to a specific user.