Recent developments in the fields of speech-recognition, natural language systems and high-speed networking have greatly contributed to the advancement of VAC systems. It is now possible for a person to pick up a telephone handset and initiate a call by uttering simple voice commands such as "call mom" or "call home," rather than using the old-fashioned telephone keypad to dial a telephone number. One such system is described in U.S. Pat. No. 5,719,921. In fact, users with speakerphones no longer need to even use a telephone handset to initiate calls. Specifically, a speakerphone user can simply power on the speakerphone, utter commands in the user's language (e.g., English or Spanish) to request a call connection to another person, and sit back while the call is completed by the system.
The aforementioned functionality provided by VAC systems stems from a combination of several different technologies. Prominent among such technologies are speech-recognition systems and natural language systems, both of which are well-known in the art. A short description of each such technology, however, may prove beneficial to a better understanding of the problems that are solved by the present invention.
Modern speech recognition systems are predicated on the premise that any speech can be broken down into a sequence of sounds selected from a set of approximately forty such sounds called "phonemes". Different sounds, or phonemes, are produced by varying the shape of the vocal tract through muscular control of the speech articulators (e.g., lips, tongue, jaw, etc.). Speech recognition systems basically provide transcription of spoken utterance to a string of phonemes. A stream of a particular sequence of phonemes collectively represents a word or a phrase. In essence, speech recognition systems operate by identifying and analyzing the elemental phonemes contained in a speech signal in order to recognize the word or phrase associated with such phonemes. Hence, for speech recognition systems to be useful, they need to be embodied with the concept of language (e.g., syntax and semantics). The natural language systems provide that capability.
While speech-recognition systems afford VAC systems the ability to recognize certain words and phrases, natural language systems aim at providing rules that constitute words and phrases (language grammar rules) and attaching a meaning to the words and phrases that are recognized (language understanding). Natural language systems operate as a front-end system between a user and the VAC system providing a means for a conversational interface between the user and a machine. In other words, the natural language interface translates the language spoken by the user (e.g., English) into a language that is understandable by the target computer system. The translation function of natural language systems is accomplished through the use of a grammar that characterizes a set of acceptable input strings (or more generally, a statistical language model). This enables the speech recognition system to transcribe a spoken utterance into words and phrases as governed by the given language model. The language understanding component of the natural language system receives the output of the speech recognition system as its input. The language understanding component embodies rules and algorithms constituting a language parser (e.g., heuristics or statistical models) which produce a parse tree for the input string. The parse tree is then translated into an expression (or set of expressions) which assigns to the input string a meaning that is interpretable by the target computer system.
Both speech recognition systems and natural language systems share a common element in that they both perform better with a larger number of reference patterns (e.g., phonological information or speech templates for voice recognition systems, word and phrase patterns for natural language systems). For natural language systems, the larger the vocabulary of the system, the more flexible and user-friendly the system becomes. For example, it is easier for a person to communicate a message to another person with a vocabulary of one thousand words than three hundred words. Similarly, it is easier to communicate with a computer with a larger vocabulary of commonly understood terms. Furthermore, a larger vocabulary is more likely to accommodate the varied speaking styles and idiosyncrasies of different people. This larger vocabulary also helps speech-recognition systems, since a larger vocabulary tends to allow a speaker to construct longer phrases, and the performance of many conventional speech-recognition systems in general tends to improve when multiple words are spoken as compared to short or single words.
Because VAC systems use speech-recognition systems and natural language systems the advantages of a large vocabulary enjoyed by those systems extend to VAC systems as well. Moreover, a large vocabulary provides additional benefits for users of VAC systems. For example, VAC systems enable users to perform a variety of telephony user operations, such as calling, messaging and call forwarding by speaking names from a personally configured list of names or from a generic list provided by a service provider. It is often desirable to enhance the flexibility of the VAC lists by the association of additional attributes to each name in the list, such as a place of work or residence, or the type of service (e.g., cellular or pager). Further, more generally, it is desirable to automatically include other entries (telephone numbers, fax numbers, email address) related to a particular entry (for example, if only the work number of a particular user is originally specified in the user-configured list, automatically retrieve and enter all other available numbers such as pager, cellular, fax, and home and other pertinent information such as email, URL address etc. to the list). The additional attributes and entries to a user's list allow the VAC systems to provide enhanced services and functionality to a user. For example, when a user utters a command, such as "call mom" to a VAC system, a telephone call is automatically initiated and directed to the called party associated with that command. However, the called party may have more than one telephone number. In such a case, it would be desirable for the VAC system to query the user for the type of phone number to be dialed, such as the home number, the office number or the telephone number of the called party's wireless telephone set.
In current VAC systems, however, such additional attributes are required to be explicitly specified by the user, either by voice or text entries. While such an approach has the advantage of allowing the user to have better control over their VAC list, adding such additional attributes becomes tedious and time-consuming for the user. For example, a user would have to--at a minimum--enter a ten digit telephone number and a word or phrase for use as an identification tag for the number, not to mention any additional attributes a user might want to associate with the number (e.g., an address). Combine this with the need to input multiple entries/called parties to the calling list and the burden upon the user becomes even worse. One other problem is to update and keep the information contained these lists current, a task that is tedious to accomplish manually.
In addition to the above disadvantages, the user may not have ready access to additional attributes for the called party. This means the user must first find the information through another source prior to entry. In some instances, the user may not have access to the resources providing such information, such as those stored on a private network or even public networks such as the Internet or World Wide Web (WWW).
In view of the foregoing, it can be appreciated that a substantial need exists for an interface for a VAC system which solves the above-discussed problems.