This invention relate generally to speech recognition with a data processing system. More particularly, it relates to a natural language sensitive kiosk that will accept verbal input from a human or machine in any of a plurality of languages and which will then respond to the requests in the natural language of the inquiry.
Computers have become a presence in many facets of today's society. Once upon a time, when computing was the sole domain of highly skilled computer programmers, the user interface was text based. The computer provided text based output and the programmer provided often arcane commands in a command line interface. Given the skill of the users, this was an acceptable state of affairs, while hardly desirable. Gradually, efforts to make the computer user interface more "user friendly" were made, as less skilled segments of society began to have daily contact with computers. Such efforts have led to graphical user interfaces which mimic everyday objects as well as more user friendly input devices such as touch screens.
Another "user friendly" interface is a speech interface which recognizes a user's speech and synthesizes speech rather than, or in addition to, a visual interface. Both speech recognition and speech synthesis are well known technologies yet both have failed to fulfill the dreams of technologists or science fiction writers in the field. One of the problems confronted by speech technology is that it takes a great deal of raw processor speed to allow the computer to recognize speech even in a single language. Further, the speech dictionaries which are required for speech recognition are truly awesome in their size.
It would be desirable to provide a multilingual speech interface which could understand and respond in several different spoken languages. Such an interface would be useful to provide computing based services in venues in which people of limited computer skill and who speak different languages congregate. It would be desirable that a user could approach the kiosk, ask a question in his native language, and have the kiosk respond to the user in his native language either by speech output or through the displayed interface. There are a number of situations in which such a natural language kiosk would be useful. These situations include, but are not limited to, the Olympics for directions to events and buildings, for scores, medals and standings of competitors; airports for information and directions to baggage pickups, taxi stands, casinos, limousine services, car rental desks, ticket counters and arrival & departure gates; train and bus stations for services similar to airports; ports-of-entry for information and directions; international attractions such as the Eiffel Tower, for information, ticket counters and directions; EPCOT Center for restaurant reservations and generally, any place or event at which there will be a number of people whose native languages will be different. Input from a telephone (or computer) is considered machine input, although it may be similar to a spoken utterance.
The problems faced for speech recognition and speech synthesis are compounded by having a plurality of possible languages to understand and to which to respond. Essentially, the massive speech dictionaries and speech recognition engines must be replicated for each language. Since a typical speech recognition machine typically requires at least 32 MB of RAM with a high powered processor, it becomes difficult, if not impossible, and certainly expensive, to support a large number of spoken natural languages.
The present invention provides another solution to the problem.