This invention relates generally to the field of speech recognition and, in particular, to an apparatus and method for improving the utility of speech recognition, especially in respect of automated processes for retrieving information from a database.
The desirability of retrieving information from databases using speech recognition as a means for converting spoken words into indicia useful in retrieving information has long been recognized. A principal application for this technology has been the partial automation of telephone directory assistance services. Telephone companies and telephone equipment manufacturers have invested considerable resources in developing systems to reduce the labour costs associated with providing directory assistance services. Much of that investment has been in speech recognition algorithms designed to facilitate directory look-ups.
Although speech recognition algorithms have been consistently improved, they have to date failed to provide complete automation solutions for information retrieval applications such as directory assistance. Due to the nature of spoken language, speech recognition is inherently limited in its ability to discriminate between words which are pronounced alike but spelled differently. The utility of speech recognition is further challenged by the current mobility of the world population which contributes to a diverse ethnic mix and consequently a variety of accents and inflections in most urban centres. Consequently, most speech recognition algorithms, although finely tuned and inventively designed, are incapable of enabling complete automation of an information retrieval system.
The inherent limitations in speech recognition are readily understood. For example, humans as well as machines have difficulty in distinguishing between the sounds uttered for the letters B, C, D, E, G, P, T, V and Z. In addition, it is substantially impossible to determine the spelling of certain words, including names, based on their pronunciation. For example, the names John, Jon and Jean may all be pronounced similarly enough that discrimination of the true spelling is impossible. Likewise, the names Mary Ann, Maryanne and Marianne are simply impossible to differentiate as spoken words. Other examples too numerous to mention may be readily recited by speech scientists.
A need therefore exists for a method of improving the utility of speech recognition in order to permit the automation of functions which are usefully implemented using speech recognition technology.
It is an object of the invention to provide a method and apparatus for improving the utility of speech recognition to permit the automation of information retrieval systems which use speech recognition as the primary engine for information retrieval.
It is a further object of the invention to provide a method of implementing speech recognition that enables the automation of transactions that may be accomplished over the switched telephone network.
It is another object of the invention to provide a method and apparatus for improving the utility of speech recognition which is relatively easy to design and inexpensive to implement.
It is yet a further object of the invention to provide a method and apparatus of improving the utility of speech recognition that is designed to enable the complete automation of telephone directory assistance services.
In accordance with a first aspect of the invention there is provided a method of improving the utility of speech recognition of words spoken by a speaker, comprising:
a) capturing in electronic form a word spoken by the speaker;
b) passing the word to a speech recognition algorithm;
c) receiving from the speech recognition algorithm at least one representation of the word;
d) displaying for the speaker as text the at least one representation of the word to permit the speaker to select a correct representation of the word from among the at least one representation; and
e) repeating the steps of a)-c) in an event that none of the representations of the word are verified as correct, or enabling the speaker to communicate the at least one word in another way.
In accordance with the second aspect of the invention, there is provided apparatus for improving the utility of speech recognition of words spoken by a speaker, comprising a computer enabled to receive voice and data signals over a communications link, the computer being programmed to prompt a user for spoken words which are received from the communications link as voice signals and to pass the spoken words to a speech recognition algorithm which returns representations of the spoken words to the computer; the computer being further enabled to pass the representations of the spoken words to a voice terminal with a display surface which displays the representations for the user to permit the user to select a correct representation of the spoken words to thus improve the utility of the speech recognition of the words.
In accordance with yet a further aspect of the invention, there is provided a method of automating telephone directory services for telephone users having display telephones, comprising the steps of:
a) prompting a user accessing the directory services for names used as indicia to locate an entity in the directory;
b) accepting from the user a spoken name for each index;
c) passing each spoken name to a speech recognition algorithm and accepting from the speech recognition algorithm at least one representation of the spoken name;
d) displaying as text on the display telephone the at least one representation of the spoken name to permit the user to select a correct representation of the spoken name; and
e) assembling a query to the directory after a correct representation of each index has been selected in order to retrieve a record for the entity from the directory.
The invention therefore provides a method and an apparatus for improving the utility of speech recognition and enables a much broader application of speech recognition technology, especially in the implementation of services which entail the retrieval of information from databases. In accordance with the method, a database query is assembled by prompting a user to verbally indicate each of several names which may be used as indexes for retrieving a record of interest from a database. Each name may consist of one or more spoken words. The names are preferably requested in sequence and each name is preferably verified by passing the spoken words to a speech recognition algorithm which returns at least one text representation of the spoken name. The representations of the spoken name are then presented to the speaker who is permitted to select the correct representation of the spoken name. After all of the names required for a query have been correctly identified, a query is assembled and submitted to the database. This permits an accuracy of information retrieval which was heretofore unattainable using speech recognition alone.
The apparatus in accordance with the invention consists of voice terminals having display surfaces for displaying characters and a computer which may be accessed by the voice terminals. The computer in turn has access to a speech recognition algorithm and a database which stores the information of interest. Software enables the computer to prompt the user to utter the names required as indicia for locating a record of interest in the database. Software also enables the computer to submit captured voice signals to the speech recognition algorithm which returns one or more textual representations of the spoken name. Graphical representations of the spoken name are displayed as text on the display surface of the voice terminal to permit the user to select the correct representation. The invention may therefore be inexpensively implemented to enable a wide variety of applications.
The method and apparatus in accordance with the invention is particularly adapted to providing completely automated directory services to individuals having display telephones. The display telephones are preferably adapted to conform to the Analog Display Services Interface (ADSI) standard FR-12 developed by Bellcore. The computer is preferably a server which may be accessed by a dial-up voice-grade connection. The speech recognition algorithm may reside on the same server or may reside on another server in a local or wide-area network. Preferably, at least one speech recognition algorithm is provided in every region of a telephone network in order to permit regional training for the recognition of locality names as spoken by local speakers. The directory database, on the other hand, is preferably accessed through a wide-area network and centralized to eliminate maintenance duplication and maximize accuracy.
It will be well understood by those skilled in the art that this technology may be used in many other applications where information is usefully retrieved or transactions are conducted using spoken language. The method and apparatus in accordance with the invention may therefore also be used, for example, to implement a voice order system for telephone retail sales operations, an automated voice reservation system for hotel accommodations, and many other applications too numerous to mention.