Modern classifiers use techniques which are highly complex when high accuracy classification is needed. For example, a traditional neural network structure needing high accuracy also needs a complex structure to perform classification because of difficulty in grouping different classes within the neural network structure.
Additionally, in pattern recognition systems such as speech recognition, when a spoken command is identified, the spoken command is identified as one of a group of commands represented by a collection of models. Existing speech recognition systems require large amounts of processing and storage resources to identify a spoken command from a collection of models because the systems fail to use a combination of observation cost and state information to train low complexity models for identifying spoken commands.
Another problem with existing systems is that polynomial classifiers fail to use a combination of observation cost and state information when performing identification of classes (e.g., spoken commands, phoneme identification, digital images, radio signatures, communication channels, etc.). Additionally, a problem with training systems for polynomial classifiers is that existing systems do not train models using a method which exploits state information within training data.
Another problem with speech recognition systems is that such systems require accurate and low complexity methods for identifying an acoustic event. Typically, this is accomplished by separating speech into isolated phonetic units; for example, the word "happy" is represented as a sequence of four phonemes "H", "AE", "P", "IY". A popular technique for determining phonemes from a spoken word is to use the Hidden Markov Model (HMM). HMM's classify by incorporating a finite state machine in a stochastic framework. HMM's represent the order of the phonetic sounds by states. In an HMM, the probability that a certain sound has been emitted is encapsulated in observation probabilities. These probabilities are typically modeled by a Gaussian Mixture Model (GMM). A problem with GMM's is that GMM's only provide limited accuracy for text independent speaker verification. Another problem is that GMM's only provide a local optimum.
Thus, what is needed are a system and method for identifying classes from a collection of predetermined classes using limited processing and storage resources. What is also needed are a system and method which can train a set of predetermined classes using limited processing and storage resources. What is also needed are a system and method which combine observation cost and state information when identifying classes from a set of predetermined classes and training models which represent the set of predetermined classes. What is also needed are a system and method for identifying an acoustic event. Also needed are a system and method for accurately modeling the probability that a certain sound emitted for text independent speaker verification is encapsulated in observation probabilities. What is also needed are a system and method for modeling a global optimum that a certain sound which has been emitted is encapsulated in observation probabilities.