1. Field of the Invention
This invention relates to a system for numeric language recognition in natural spoken dialogue.
2. Description of the Related Art
Speech recognition is a process by which an unknown speech utterance (usually in the form of a digital PCM signal) is identified. Generally, speech recognition is performed by comparing the features of an unknown utterance to the features of known words or word strings. Hidden Markov models (HMMs) for automatic speech recognition (ASR) rely on high dimensional feature vectors to summarize the short-time, acoustic properties of speech. Though front-ends vary from speech recognizer to speech recognizer, the spectral information in each frame of speech is typically codified in a feature vector with thirty or more dimensions. In most systems, these vectors are conditionally modeled by mixtures of Gaussian probability density functions (PDFs).
Recognizing connected digits in a natural spoken dialog plays a vital role in many applications of speech recognition over the telephone. Digits are the basis for credit card and account number validation, phone dialing, menu navigation, etc.
Progress in connected digit recognition has been remarkable over the past decade. For databases recorded under carefully monitored laboratory conditions, speech recognizers have been able to achieve less than 0.3% word error rate. Dealing with telephone speech has added a new dimension to this problem. Variations in the spectral characteristics due to different channel conditions, speaker populations, background noise and transducer equipment cause a significant degradation in recognition performance. Previous practice has strictly focused on dealing with constrained input speech to produce digit sequences.