Pattern recognition concerns the operation and design of systems that recognize patterns in data. It encloses subdisciplines like discriminant analysis, feature extraction, error estimation, cluster analysis (together sometimes called statistical pattern recognition), grammatical inference and parsing (sometimes called syntactical pattern recognition). Some applications of pattern recognition are image analysis, character recognition, man and machine diagnostics, person identification, industrial inspection, and speech recognition and analysis.
One application of pattern recognition is speech recognition. Speech recognition is not as efficient as it could be. Many speech recognition techniques are too slow and require too much of a computer's resources to be practical in some computing devices, such as personal digital assistants (PDAs). Some of these inefficient speech recognition techniques use neural networks, dynamic time warping (DTW), and Hidden Markov Models (HMMs). Neural networks for speech recognition require large amounts of training data and long training times. DTW builds templates for matching input speech that need to be fairly exact templates, not allowing for much variability. HMMs, which are commonly used in speech recognition, are too slow and inefficient and it is difficult to mathematically characterize the equivalence of two HMMs.
FIG. 1 is a block diagram that shows a conceptual view of a Hidden Markov Model (HMM) 100, which is prior art. In FIG. 1, the HMM 100 has five hidden states 102-110, transitions 112-118 between hidden states 102-110, and outputs 120-170 generated by the hidden states 102-110. In FIG. 1, the transitions 112-118 are shown as solid lines, while output generation from the hidden states 102-110 is shown in dotted lines. An HMM 100 is defined by (1) a set of hidden states (Q=q1q2 . . . qn), (2) a set of transition probabilities (A=ao1a11 . . . an1 . . . ann), and (3) a set of observation likelihoods (B=bi(ot).
Each hidden state 102-110 (qi) accepts input (I=i1i2 . . . it). The input is sometimes called observables and represents one or more parts of speech, phones, phonemes, or processed speech signals. Phonemes capture pronunciation variations by classifying them as abstract classes. A phoneme is a kind of generalization or abstraction over different phonetic realizations. For example, the phonemes for the spoken words “one five” are “wah n fah i v.” Suppose input i1 is the phoneme “wah” that is recognized by hidden state one 102 and the next input i2 is the phoneme “n” that is recognized by hidden state two 104.
Each transition 112-118 has a transition probability (aij) representing a probability of transitioning from one hidden state 102-110 to another hidden state 102-110. For example, there might be a 0.5 probability of transitioning from hidden state one 102 to hidden state two 104 upon receiving a certain input, such as the phoneme “wah.”
Each observation likelihood (bi(ot)) expresses the probability of an output (ot) being generated from a hidden state 102-110. For example, in hidden state one 102, there might be a 0.6 probability of generating output “wah”, a 0.1 probability of generating output “n,” a 0.1 probability of generating output “fah,” a 0.1 probability of generating output “i,” and a 0.1 probability of generating output “v.”
As input speech is recognized, the HMM 100 moves from one hidden state 102-110 to another based on the probability of the transitions 112-118, generating outputs 120-170. The outputs 120-170 are the recognized speech. Speech recognition using HMMs has an algorithmic complexity of O(n3). There is a need for an alternative to HMMs which is more efficient.
For these reasons and more, there is a need for a more efficient speech recognition technique.