The forms of computing and electronic devices are changing rapidly and so too are means for interfacing with said devices. In the past the mouse and keyboard were the industry standard for input means however advanced technologies allow for novel possibilities. Voice input and gesture input are two newer methods for establishing a human-machine interface (HMI).
It is important to be able to train computing and robotic devices to learn recognize unique gestures and voice commands and responsively execute desired functionality. For example, a voice command for launching a web browser such as “open web browser” could be used instead of or alongside traditional techniques (e.g., a mouse double click of an application icon). Additionally, physical gestures may be used to perform various tasks as well. Furthermore, control of robotic devices such as industrial robots, radio control cars, drones, 3D printers and the like may be gesture based as well.
The general approach used by many prior technologies consists of three steps: (a) Identification of a ‘feature vector’ for a given (a priori fixed) set of activities; (b) Offline training of a model for this feature vector based on sensor data from a single or multiple users; and (c) Online recognition of the activity based on online computation of features from the sensor data and its comparison against the trained model. The main problems with this approach are as follows:
1. Feature selection: The performance of these algorithms critically depends upon selection of ‘good’ features. Selecting features from sensor data is not an easy task. The usefulness of any given feature is dependent in a highly nonlinear manner on the activity, on the sensor, on the placement of a sensor on the body and possibly also on the subject [17, 18]. The number of features, however, can quickly grow in size—as many as 600 features are available in MSP [4].
The need to select a small number of ‘good’ features in an a priori manner is the main reason why existing solutions are limited to pre-programmed activities.
2. Large training data set requirement: For a given set of features, training a good model (e.g., decision tree [25, 5, 2]) requires a large amount of data. In practice, one collects a rich enough data set from multiple users to train these models [26, 9, 20, 8, 12]. This is a costly and time-intensive process. With data from just a single user, the resulting model can be fragile (e.g., sensitive to changes in frequency of the activity). With a large number of users, the model-fit to any particular user may be poor.
3. Accuracy: Once the models have been trained, the classification is typically the easiest step. Popular methods such as template matching using e.g., nearest neighbor classifiers (used by MotionX; cf., [13, 20]) and the decision tree classifier (an industry standard; cf., [25, 5, 2, 11, 8, 22, 27]) can be run very efficiently. Other popular algorithmic approaches include: instance based learning (IBL) approaches [25, 5, 28]; Bayesian networks (BN) [29, 9, 30] and Naive Bayes (NB) classifiers [8, 3, 14, 31]; support vector machines (SVM) [32, 15, 21, 33]; fuzzy If-Then rules [10, 24, 6] and artificial neural networks (ANN) [29, 12, 34].
However, these approaches suffer from the issue of robustness: the conditions for online data and the training data must be closely matched (in terms of number of sensors, window length of data, features, even subjects) for the results to be accurate; cf., [35, 5, 36, 25, 24, 37]. Many classifier approaches such as IBL and ANN can be non-robust and do not provide information on activity counts or duration, etc. [27, 33, 34].
In open literature, probabilistic approaches such as regression methods [7, 9], hidden Markov models (HMM) [38, 14] and multi-graph based techniques [39, 16] have also been considered, but their practical application suffer from the curse of dimensionality, e.g., results reported in [39, 16] lead to a graph size of 16,875 nodes making it computationally prohibitive. A simple four state HMM with five independent sensor measurements requires a total of 50 parameters that need to be learned (a specific example of this appears in [40]). This requires a large amount of training data to learn (identify) the parameters, and the problem is known to be non-convex with multiple local minima for parameters; cf., [41].
Accordingly, there is a need for methods and systems for learning sensor data patterns for input and control of electronic systems.