Spoken language is the most natural and convenient communication tool for people. With data storage capacities increasing rapidly, people tend to store greater amounts of information in databases. Accessing this data with spoken language interfaces offers people convenience and efficiency, but only if the spoken language interface is reliable. This is especially important for applications in eye-busy and hand-busy situations, such as driving a car. Man-machine interfaces that utilize spoken commands and voice recognition are generally based on dialog systems. A dialog system is a computer system that is designed to converse with a human using a coherent structure and text, speech, graphics, or other modes of communication on both the input and output channel. Dialog systems that employ speech are referred to as spoken dialog systems and generally represent the most natural type of machine-man interface. With the ever-greater reliance on electronic devices, spoken dialog systems are increasingly being implemented in many different machines.
In many spoken language interface applications, proper names, such as names of people, locations, companies, places, and similar things are very widely used. In fact, it is often the case that the number of proper names used in these applications is significantly large, and may involve foreign names, such as street names in a navigation domain or restaurant names in a restaurant selection domain. When used in high-stress environments, such as driving a car, flying a helicopter, or operating machinery, people tend to use short-hand terms, such as partial proper names and their slight variations. The present problems of proper name recognition in conventional spoken language interface applications include inadequate speech recognition accuracy in the speech recognizer component for these names, and inadequate recognition accuracy of these names with regard to the presence of these names in the system database.
Present name recognition methods on large name lists generally focus strictly on the static aspect of the names. Such systems do not utilize certain contextual elements that can significantly aid in the recognition process for proper names. Such contextual elements can include the temporal, recency, and context effect when names are used.
Present recognition systems may also be configured to confirm proper names by means of direct confirmation. In this method, the system responds to a question by rephrasing the user's utterance and directly mentioning the name or names, as they were understood by the system. One type of direct confirmation system explicitly asks the user whether he or she mentioned a specific name or names. For example, if the user is making an airplane reservation, he might say “I want to fly from Boston to New York”. The system may then respond by saying: “You said Boston to New York, is that correct?” The user must then answer that this was correct or incorrect and provide any correction necessary. In order to make the system seem more conversational, the confirmation may be restated in a less direct manner. For example, if the user says “I want to fly from Boston to New York” the system my respond by saying “OK, when would you like to fly from Boston to New York?” This type of confirmation, called implicit confirmation, relies on the fact that if the system incorrectly understood and wrongly stated one or more of the names, the user would provide a correction; but if the system correctly repeated the names, the user would not say anything about the names. By including the proper names in the response, the system has directly confirmed the names as understood by the system. Direct confirmation systems are generally cumbersome in that they involve restatement of the proper names uttered by the user and are thus overly repetitive, adding time and possibly frustration to the user experience. These systems are also disadvantageous in that they may tend to repeat or propagate errors that are made during the speech recognition process.
What is needed, therefore, is a dialog system that utilizes contextual information and tries to address the issues in the proper name recognition task for spoken language interface applications, namely improving the speech recognition accuracy for these names, and the recognition accuracy of these names.