Speech recognition systems are used to identify word sequences from unknown speech utterance. In an exemplary speech recognition system, speech features such as cepstra and delta cepstra features are extracted from the unknown utterance by a feature extractor to characterize the unknown utterance. A search is then done to compare the extracted features of the unknown utterance to models of speech units (such as phrases, words, syllables, phonemes, sub-phones, etc.) to compute the scores or probabilities of different word sequence hypotheses. Typically the search space is restricted by pruning out unlikely hypotheses. The word sequence associated with the highest score or likelihood, or probability, is recognized as the unknown utterance. In addition to the acoustic model, a language model that determines the relative likelihood of different word sequences is also used in the calculation of the overall score of the word sequence hypotheses.
Through a training operation, the parameters for the speech recognition models are determined. The speech recognition models may be used to model speech as a sequence of acoustic features, or observations produced by an unobservable “true” state sequence of sub-phones, phonemes, syllables, words, phrases, and the like. Model parameters output from the training operation are often estimated to maximize the likelihood of the training observations. The optimum set of parameters for speech recognition is determined by maximizing the likelihood on the training data. The speech recognition system determines the word sequence with the maximum posterior probability given the observed speech signal to recognize the unknown speech utterance. The best word sequence hypothesis is determined through the search process that considers the scores of all possible hypotheses within the search space.