1. Field of the Invention
The present invention relates, in general, to data processing systems, and in particular to the area of natural language processing, computational linguistics and speech recognition.
2. Description of Prior Art
One of the challenges every natural language system faces is trying to map the syntax of a complex natural language request to its meanings and ultimately, mapping those meanings to existing computer applications, which execute transactions and queries in a company's information system.
There exist systems capable of parsing complex natural language syntax and convert it to some semantic framework, like predicate calculus, lisp cons, semantic nets, frames, etc. However, the question always remains of how to translate the semantic structure into computer queries or commands that can reuse existing commercial applications and databases that are proprietary to a specific business. The software developer then usually opts to abandon such natural language interfaces and develops a dialog flow from scratch using Java, C++ or Voice XML. The dialog is then designed for the specific application, but it tends to limit the user to specify only simple commands
For a number of years researchers in the artificial intelligent field have developed systems that appeared to maintain text based conversations with humans. Going back to 1966, an M.I.T. professor, Joseph Weizenbaum, created Eliza as a “virtual therapist”. It would take a user's statement and turn it around as a question, emulating psychiatrists' therapy techniques. In recent years, similar systems have been developed, such as Alice, by Richard Wallace, at the New York University [Thomp02]. These systems are strictly specific to the domain or application they are designed for. They are not scalable to other applications and not ready for the commercial arena. Another problem associated with these types of systems is that it takes many years to develop. A business that wants to provide customer support to its users may not be able to afford a team of researchers from MIT to develop a virtual customer support agent through the course of several years. Another problem with these traditional conversational systems is that they are not designed to interact with existing databases or applications. They are typically self-contained and for the most part only suited for research.
Prior art is available for developers to program dialog flows for the purpose of constructing voice enable systems in telephony applications, but the resulting dialog capabilities tend to be limited to simple voice commands. For example, the VoiceXML language was invented for this purpose. However, trying to develop with VoiceXML a conversational dialog that also allows for complex natural language requests, although it may be possible, it would be a daunting task and the cost would be much higher than the average software application. The developer has to account for all possible outcomes of the user's utterances and build a reply for every possible situation, which would be virtually impossible since the potential sentences that a user can provide are unpredictable. This dialog has to be paired with complex business logic at the application level, in order to support all such possible outcomes. The result is a rigid and proprietary application. Voice-enabled applications today are like old transaction systems written in COBOL during the days when relational databases didn't exist. In many cases, the cost of developing the dialog flow doesn't justify the return on investment.
In addition, current VoiceXML based systems lack the capability to handle spontaneous requests and interruptions in the dialog. Again, they are guided by a rigid dialog flow.
Some prior arts are capable of retrieving answers from documents to user questions, and some times these questions can be phrased with complex natural language, but such systems are capable of only answering questions from digital documents, and/or databases of documents. Such capability can be found in recent U.S. Pat. Nos. 5,873,080 5,953,718 5,963,940 5,995,921 5,836,771 and 5,933,822. Although these answer retrieval systems can retrieve information from documents, they cannot be programmed to execute existing re-usable applications and/or databases. In addition, with these systems users are not able to request transactions in natural language, such as a hotel reservation or a transfer of funds. They are incapable of performing a transaction, because such systems can only answer questions from information existing in digital documents.
In the late 80's and early 90's attempts were made to map natural language to SQL (Sequence Query Language) for the retrieval of information from relational databases, but this attempt proved to be futile. Too many times the resulting SQL statements were wrong. Other times, the natural language phrase could not be converted to SQL. The number of natural language queries that the system could convert to the same SQL statement was limited and even finding all the variations of such natural language statements was an exhausting process. For the most part the users were highly limited on what they could phrase and how to phrase it [Ruwan00] (also see U.S. Pat. Nos. 5,197,005 5,265,065 5,386,556 5,442,780 5,471,611 and 5,836,771). In any case, even when some of these projects may have partially succeeded, they are strictly limited to relational database access, which many times it is the least preferred way to execute transactions from the user interface. In today's systems, it is preferable for the user interface to have the flexibility to execute transactions and/or queries through an application layer.
Some prior arts employ state machines to guide the dialog, such as in U.S. Pat. No. 6,356,869. While such state machines are able to prompt the user as they discover new information in the database based on user queries, they are not able to identify the information that is missing from the user's request, in order to complete a query or a transaction, and to guide the user accordingly.
Prior arts also fail to adapt to the level of precision in which humans provide natural language requests. Prior art can perform one of the following functions:                Guide the dialog under presumed logical precision as seen in U.S. Pat. No. 6,356,869 and U.S. Pat. Nos. 5,197,005 5,265,065 5,386,556 5,442,780 5,471,611 and 5,836,771. Such prior art requires precision in the user's request and can only either deliver with precision or cannot deliver at all.        Search for information stochastically and/or heuristically, and deliver information approximated to the user's request. Such are U.S. Pat. Nos. 5,933,822 5,995,921 5,953,718.Prior arts are not designed to distinguish between precision and imprecision, and cannot process the request accordingly. They can do either one or the other, but not both. We viewed these precision and imprecision categories as different types of communicative acts. In addition, there are other communicative acts that appear in natural language discourse. In general prior arts have a difficult time distinguishing between different kinds of communicative acts and how they are linked together, of if they are linked at all.        
Additionally, the prior arts referenced herein are not designed to be integrated with each other. Each prior art merely tries to solve a single problem in the field of natural language processing. Yet, a natural language dialog, that would be common between humans, can challenge any of the prior arts individually with a variety of problems. That is, there is not one single prior art that can solve enough natural language problems necessary to maintain true natural conversations with humans and, at the same time, interact with third party software application systems.
In the area of dialogs with voice using Speech Recognition and Text-To-Speech interfaces, we have already pointed out the problem of building a complex dialog with the tools that are currently available. Other problem associated with building applications that include voice, is the accuracy of the speech recognition interface. The speech recognition interface can only rely on the phonetic algorithms provided by the speech recognition engine and the grammar provided by the developer in order to transcribe utterances to text. But both, the speech recognition algorithms and the grammar are not enough for the speech recognition to be accurate. Employing semantics and context in validating the accuracy of the produced text would greatly help the speech recognition process to be more accurate.
Another problem associated with building applications that include voice is the fact that they can only be used with a voice interface. That is, such applications are hard-wired to the telephony application and to the existing company's IVR (Interactive Voice Response) infrastructure. They are not designed to be used with other media, such as a web browser. Text based interaction is still important because it allows users to interact via a web browser or with an instant messenger. An up and coming technique currently being used for providing customer support is service through an “instant messenger”, which allows customers to interact with a customer representative agent using text chatting. This invention provides automated chatting, as it would take workload off existing agents and it would translate to great savings for businesses, as it would allow companies to expand customer support without having to increase the work force. But this automated chatting would be most useful if the same system that serves text-based chatting can also serve the voice-enable telephony application, thereby providing conversational capabilities in either media.