1. Technical Field
This invention relates to a method and apparatus for voice operated directory dialer name recognition. In particular, the invention relates to an improvement for enabling the directory dialer to recognize a spoken name which is not part of the system directory.
2. Description of the Related Art
IBM® Directory Dialer is a speech enabled application running on an interactive voice response system (IVR) with name recognition functionality (for instance IBM ViaVoice®). Name recognition differs from speech recognition primarily that the recognition is focused on only names rather than general vocabulary. Hence, the phoneme set and the grammar sets of allowable phonemes relate only to names. In principle, this should result in much simpler technology than would be needed for full speech recognition. The IVR connects to a telephony network and prompts a telephone user for the name of the person that they wish to call. The directory dialer recognizes the name, matches the name to the respective number, and transfers the call to the number for the user.
In order for the directory dialer to work it needs to extract information from a database of names and associated telephone numbers. A useful directory Internet protocol used by email clients to look up contact information is LDAP (Lightweight Directory Access Protocol). In a directory dialer an overnight provisioning process accesses the LDAP database to extract names and produce baseforms and grammars as needed by the name recognition process. A baseform is a basic phonetic element such as a phoneme; all possible baseforms form the acoustic model of the directory dialer. A grammar defines sequences of baseforms, each sequence associated with a name.
The operation of this directory dialer is shown in FIG. 2. In the Figures, a left pointing box is an action performed by the directory dialer and a right pointing box is an action performed by a user. The directory dialer waits, step 201, for a user to call the IVR system using a phone number indicative of a directory dialer application to be used. The application greets, step 203, the user with a welcoming message and prompts, step 205, for the name of person being called. Some variations require name and location or name and department. Once the user has spoken the name, step 207, the application attempts to recognize, at step 209, the spoken name.
The name recognition process of the prior art and the process of the present embodiment involve breaking the speech down into msec chunks (typically 10 msec). These chunks are then processed to produce a number of spectral fourier values, say 64 values. The number of values is further reduced by normalizing and fitting polynomial coefficients to the fourier values. By looking at adjacent chunks to provide delta coefficients, the number of coefficients is reduced to typically 39. The name recognition system then performs pattern recognition on a group of coefficients to identify a specific phoneme. Since the accuracy is far from perfect, a best fit of the most likely phonemes and then the most likely strings of phonemes is made. The number of possible strings is restricted to the phoneme sets in the grammars. The system then finds the most likely name in the directory as well as an overall confidence score as to how well the phonemes match.
The application compares the confidence score with an upper threshold value (x), step 211. If the confidence score is above the upper threshold value (x) then it is assumed that the user's speech has been correctly recognized and the call is immediately transferred, at step 213, to the recognized destination name. Otherwise the directory dialer compares the confidence score with a lower threshold value (y), step 215. If the confidence score is below the lower threshold value (y), step 215, then the process moves to step 217. Otherwise the process transfers to step 216 where the directory dialer apologizes for not understanding and starts over at step 205. At step 217 the user is asked to confirm with a ‘yes’ or ‘no’ the recognized name. The user speaks a reply, step 219, and the call is then either transferred, step 221, to the appropriate number or the system prompts the user to try again and the process repeats, step 205.
It is frustrating for users when name recognition does not recognize a spoken name and the directory dialer forwards the user to best recognized name without checking. If the best recognized name is incorrect then the user will not know until the call is put through to the wrong person. However in some cases it is not the directory dialer that has made an error but the user unwittingly speaks an invalid name, either as a mistake or as a result of somebody leaving the company and no longer being included in the directory. Mistakes occur simply when a user incorrectly remembers a persons first name or second name, e.g., somebody asks for Kevin Sloan, when they mean Keith Sloan or Kevin Smith.
Invalid names are not part of the grammar because the grammar is a finite number of sets of baseforms corresponding to the set of names in the directory.
Directory dialers that construct grammars from a text directory are known. In the prior art it is known to construct a grammar as a concatenation of phonemes from all the text of the names as found in the text directory. This will include first and family names and all other names. One problem is that it does not allow the speaker to understand if the spoken name exists in the directory since the directory dialer will always select the nearest match in the grammar. The directory dialer selects the nearest match regardless of whether the recognition is correct or whether the spoken name exists in the directory. A solution to this problem is to have a very large number of allowable names in the grammar. However this would demand excessive memory and processing.
Another solution is found in U.S. Pat. No. 5,912,949. This publication discloses a directory dialer that will always prompt the user with the result before connecting the user to the recognized name. This publication also recognizes a name and an initial from voice data and discloses that the system may ask the user directly for the name and initial of the desired name before any attempt to recognize the name is made. However, this publication describes how each name in the directory includes a phoneme string comprising the name and the initials. Moreover, more often than not, the name recognition is correct and it can also become frustrating to be asked each time to confirm a correctly recognized name.