Automatic recognition of spoken connected digit strings has become an important issue in the telephone industry. "Connected digits" are spoken strings of digits that are unbroken by other words. Telephone numbers and credit card numbers are examples of connected digits. Automatic recognition of connected digits has been the focus of many research efforts. For example, the papers by J. G. Wilpon, C.-H. Lee, and L. R. Rabiner, "Improvements in connected digit recognition using higher order spectral and energy features," Proc. IEEE International Conference of Acoustics, Speech, and Signal Processing (ICASSP), pp. 349-352, May 1991, and R. Cardin, Y. Momandin, and R. De Mori, "High performance connected digit recognition using codebook exponents," Proc. ICASSP, Vol. I, pp. 505-508, March 1992, both address this issue. Through these papers and the efforts of others, significant advances have been made resulting in a recognition accuracy that is acceptable for many applications. However, to be deployable in an actual telephony application, such as recognition of spoken telephone numbers or credit card numbers, especially with users unfamiliar with the technology, a connected digit recognizer must be robust over a wide range of user behavior. For instance, the ability to detect cases where the input speech does not contain a connected digit string is an important feature of the recognizer.
The Hidden Markov Model (HMM) speech recognizer is the preferred recognizer for enabling machines to recognize human speech. An HMM recognizer develops a candidate word by determining a best match between the spectral content of the input speech and the predetermined word models of its vocabulary set. HMM recognizers also determine segmentation information (i.e., the beginning and end of the candidate word) and a likelihood score that represents whether the candidate word is more or less probable. For many applications, this likelihood score can be compared to a threshold to determine whether the candidate word is present in the input speech, or whether to reject it.
This simple rejection method based on the HMM likelihood comparison, however, is not sufficiently reliable for many applications. This rejection method cannot reliably detect utterances that contain a connected digit string, and reject utterances that do not contain a connected digit string, which are two important features of a reliable connected digit recognizer. Furthermore, in most applications, it is desirable to reject a connected digit string that has been misinterpreted by the recognizer (e.g., substitution of one number for another), since rejection in such cases is a "softer" error than causing misconnection or misbilling due to the incorrect recognition. In this case, it is more desirable to have rejection simply followed by reprompting.
Many elaborate rejection methods have been proposed, some in the context of word spotting for conversational speech monitoring, and others in the context of word spotting for telecommunications applications. For example, U.S. patent application Ser. No. 07/989,299, filed Dec. 11, 1992, by the present applicant and assigned to the assignee hereof, describes a keyword/non-keyword classification (rejection) system in the context of word spotting for isolated word recognition in telecommunications applications. In that patent application, the output of a Hidden Markov Model (HMM) detector is post-processed by a two-pass classification system that derives a value for a keyword model that may be applied to a threshold on which a keyword verses non-keyword determination may be based. A first pass comprises Generalized Probabilistic Descent (GPD) analysis which uses feature vectors of the spoken words and HMM segmentation information (developed by the HMM detector during processing) as inputs to develop confidence scores. The GPD confidence scores are obtained through a linear combination (a weighted sum) of a processed version of the feature vectors of the speech. The confidence scores are then delivered to a second pass, which comprises a linear discrimination method using both the HMM scores and the confidence scores from the GPD stage as inputs. The linear discrimination method combines the two sets of input values using a second weighted sum. The output of the second stage may then be compared to a predetermined threshold by which a determination of whether the utterance was a keyword or not may be made.
Using such an HMM keyword rejection method for connected digits, however, requires that the HMM recognizer compare the utterance to all possible strings (every digit combination). For seven digit telephone numbers, that is 7.sup.11 possible combinations (seven digits and 11 possible numbers: 1-9, "oh" and "zero"). For credit card numbers which have many more digits, this computational effort is often too great to be practical.
Therefore, a problem is that using prior an rejection techniques for connected digit is neither reliable enough nor efficient enough for connected digit applications that require high reliability and reasonable computational complexity, such as telephone systems and credit card applications.