Recent advances in speech recognition technology have provided the impetus for the increasing commercialization of speech recognition technology in different market segments of various industries. One such industry that has experienced increased use of speech recognition system, is the telecommunications industry which strives to apply the technology in automated attendant systems for services or applications such as order taking, directory assistance, data entry to name a few. Proponents of speech recognition technology believe that it is well suited for telecommunications applications. In most applications of speech recognition technology in the telecommunications field, a user is prompted to speak into the mouthpiece of a telephone handset. The speech signals provided by the speaker are first converted into digital values through a sampling process, and thereafter the digital values are in turn converted into a sequence of patterns to allow the words uttered by the speaker to be recognized from a list or group of pre-stored words. Predetermined words within the list are typically stored as templates wherein each template is made of sequences of patterns of speech sounds better known as "phonemes". This type of recognition technique is commonly referred to as "whole word template matching". Over the last few years, the word-template-matching technique has been advantageously combined with dynamic programming to cope with nonlinear time scale variations between spoken words and pre-stored templates.
In spite of the recent technological advances in speech recognition technology, a series of factors, however operate to impede the commercialization of speech recognition systems. Prominent among such factors is the inability of speech recognition systems to easily distinguish homonyms, such as "to", "too" and "two". Equally problematic is the difficulty of recognizing words that may be uttered or even pronounced differently due to the effect of speakers' regional accents. It is also well known that speech recognition systems have some difficulty in separating from each other words that rhyme, or otherwise sound alike, such as "bear" and "pear", "but" and "pot".
In response to this problem, three solutions have been proposed. One such solution that is described in U.S. Pat. No. 5,212,730, is to use text-derived recognition model in concert with decision rules to differentiate various pronunciations of a word. Another solution proposes the use of context-related data and decision rules, in addition to stored templates, to facilitate more accurate recognition of spoken words. A third solution opts out of speech recognition all together in favor of receiving information from a user in the form of Dual Tone Multi-Frequency (DTMF) signals entered by a user from a touch-tone keypad of a telephone set. Although DTMF entries accurately represent numeric strings provided by a user, they are ill suited for applications in which the numeric strings include more than fifteen digits. The digits in such long string need to be re-entered, one at a time, if an error occurs at any time during the keying process. Of particular significance is the inability of DTMF entries to accurately represent alphabetic or alphanumeric string of characters since each key on a telephone keypad shares at least three letters.