1. Field of the Invention
The invention generally relates to the area of artificial intelligence, and more particularly, relates to machine learning, especially in the context of generating motion recognizers from example motions. In some embodiments, recognizer makers can be incorporated into, or used alongside of end-user applications, where end users can create ad-hoc personalized motion recognizers for use with those end-user applications.
2. Related Art
Our ability to fulfill the promise of freeform human motion control of software applications is strictly limited by our ability to detect and recognize what a given human is trying to do. Broadly speaking, the most interesting motion control possibilities come from interpreting the following human “devices”: fingers, hands, facial expressions, and movements of head, shoulders, torso, and legs. Humans are very good at interpreting the gestures and expressions of other humans, but are yet unable to create machines or code that can perform at a similar level.
Writing program code to recognize whether a supplied motion is an example of an existing set of known motion classes is difficult. In part, this is because there are many sources of specialized motion data to operate on, each with a relatively small body of public knowledge on practical methods for processing such data, each with different semantic interpretations and operational ranges, and none of which reflect the anthropological information any competent human could pick up. The resulting motion data is often complicated and counterintuitive. For example, when presented with a simple graph of 3D accelerometer outputs versus time, people skilled in the art struggle to determine what gesture that time series of data corresponds to. Even the simpler task of selecting which motion graphs belong to the same gesture confounds most experts presented with the problem. The problem is exacerbated by sensor noise, device differences, and the fact that data for the same gesture can appear quite different when performed by different people with different body types and musculatures, or even by the same person at different times. It is a difficult challenge under these conditions for one skilled in the art to build effective motion recognizers.
Along with challenging source data, the fact that the data is dynamic over time, not static over time, is a significant hurdle to overcome. Freeform human motion, in the general sense, is characterized by movement over time, and subsequent motion recognition must be characterized by computation over time series data. The typical pattern recognition or gesture recognition approach of computing a large number of static features based on one step in time, then carrying out discrimination-based recognition, is not relevant to this invention.
A third characteristic of freeform human motion that poses a significant challenge for automated motion recognition systems is the desire to allow every individual user to create and personalize their own “ad-hoc” (i.e. not predefined) motion recognizers. The prior art contains many examples of algorithms that experts in the field can apply to specific predefined sets of gestures for static recognition. The ability to use a predefined set of gestures means a vast number of practical corners can be cut. For example, classifier construction times can be days or weeks. Training data can contain millions of examples. Biases can be built in that work fine for 3-5 different classes but fail outside that range. Characteristics specific to the predefined set of classes can be hard coded into the algorithm and the corresponding parameters. Broadly speaking, the ability to do classification over a small number of predefined classes has little or no bearing on the ability to do ad-hoc motion recognition. To our knowledge, there is nothing in the prior art that provides teaching related to end-user creation of ad-hoc motion recognizers.
In previous work, such as Kjeldson [3], systems and methods are described for taking a collection of static images of a hand, constructing a large collection of static features describing that image, and building a classifier with tools like neural networks that can recognize subsequent static images. This work is not relevant to building ad-hoc motion recognizers. First, Kjeldson's input data is static image data. There is no time component and no mixed mode inputs. Techniques that work for static classification problems do not apply to freeform human motion control. Additionally, Kjeldson [3] focuses on techniques that could be applied by one skilled in the art to construct a classifier that will differentiate between a preconceived collection of static images. However, it is highly desirable to allow those unskilled in the art to be able to create classifiers that will recognize ad-hoc sets of gestures that are not preconceived.
In previous work such as Kwon [4], systems and methods are described for creating a trainer/trainee session where hidden Markov models are built representing trainer motions, and used to recognize incoming trainee motions. This approach relies on error rates of 40-60% being acceptable for the trainee. Most applications, however, such as computer video games, require success rates of upwards of 95%. Furthermore, the methods described in Kwon [4] require three components in the training signals: a start position; a motion; and an end position. This approach does not work in applications that wish to provide freeform motion control, since the starting and ending positions are not predefined, and can not reasonably be quantized a priori without making the construction of a reasonable training set a virtual impossibility.
The teachings in the present invention take the unprecedented step of giving unskilled end users the ability to create ad-hoc personalized recognizers for use in various applications. The incoming data is a broad mix of motion signals over time with no predefined gestures, no constraints on how to execute them, and no predefined starting poses or stopping poses. There is no coding involved in building the motion recognizers. End users can create any motion recognizer they choose, simply by giving examples. Objects, features, and advantages of the present invention will become apparent upon examining the following detailed description.
The detail of the references hereby incorporated by reference as if fully set forth herein includes.    [1]. E. Keogh and M. Pazzani, Derivative Dynamic Time Warping, in First SIAM International Conference on Data Mining, (Chicago, Ill., 2001);    [2]. Lawrence R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77 (2), p. 257-286, February 1989;    [3]. R. Kjeldson and J. Kender, Towards the Use of Gesture in Traditional User Interfaces, Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition) 1996; and    [4]. D. Kwon and M. Gross, Combining Body Sensors and Visual Sensors for Motion Training, ACM SIGCHI ACE 2005.