Speech recognition is a classification task. In maximum likelihood classifiers, each classifier is trained by examples that belong to its class. For example, the classifier which recognizes the digit xe2x80x9c1xe2x80x9d is trained by multiple pronunciations of the digit xe2x80x9c1xe2x80x9d.
A commonly used classifier is a Hidden Markov Model (HMM). Each word is modeled by a different HMM which serves as an abstract xe2x80x9cpicturexe2x80x9d of this word, with all its possible variations. The HMM consists of a sequence of xe2x80x9cstatesxe2x80x9d, each state is responsible for the description of a different part of the word. The use of HMM in speech recognition consists of two phases: the training phase and the recognition phase. In the training phase, repetitions of each word from the training data are used to construct the corresponding HMM. In the recognition phase, the word models may be used to identify unknown speech by checking the unknown speech against the existing models.
Some words sound similar to each other and can therefore be incorrectly recognized. Using digits as examples, xe2x80x9cgoxe2x80x9d (5) and xe2x80x9crokxe2x80x9d (6) in Japanese and xe2x80x9csevenxe2x80x9d and xe2x80x9celevenxe2x80x9d in English sound sufficiently similar to cause an incorrect recognition.