The prior art includes many computer systems that allow the user to obtain information from databases by entering a natural language query or command. Examples include Intellect[1], Natural Language from NLI[2], and Language Access from IBM[3]. ("Language Access" is a trademark of the IBM Corporation.) These prior art systems generally follow the same method. First a natural language query is posed to the system using some sort of interface like a computer keyboard and screen. The system runs the input through a scanner or tokenizer that breaks the natural language (NL) query into individual words or tokens and looks up each word/token in a system dictionary. The system then uses a NL parser that parses the query into its elements. The output of the parser is organized as a parse tree that shows the relationship between the elements. The parser may also provide additional information about each parse tree element, called element attributes, that might include: the parse tree element part of speech, its tense, and/or any parse tree element synonyms, hyponyms, and hypernyms. A matching step is then performed where one or more parse tree elements and/or attributes are matched to names in the database. For a relational database, the names would include table and table field names. If the NL query can be completely and unambiguously parsed and if the relevant elements can be matched to the database names and, further, if the NL query can be transformed into a complete and correct database query then the desired information is retrieved from the database and displayed in some format on the user interface (e.g., computer screen). However, if the query cannot be unambiguously parsed or if there is a partial or multiple match between the parse tree element(s) and the database names, or a correct database query cannot be constructed, then the system is unable to "understand" the user request. Incorrect database information or no information at all will be retrieved in these cases.
There are many ways in which the system can fail to "understand" the user request. First, the scanner/tokenizer may not recognize one or more words/tokens of the NL query if, for example, one or more of the words/tokens (or their synonyms) making up the NL query do not match the entries in the system dictionary. Second, the parser may fail to correctly parse the natural language input. This can occur if the natural language input has a structure which the parser does not recognize. Alternatively, the parser can fail by yielding multiple parses. This can occur even for relatively simple NL queries.
The prior art tries to resolve these problems in a number of ways. Often the prior art asks for clarification. Clarification is helpful if the natural language query can be resolved by using a different word or by defining the misunderstood word. If the system does not understand the syntactic structure of the query, the system may ask the user to clarify the query by rephrasing the NL query in an understandable form. However, in case of multiple parses, the system must then decide to which of the possible parses it should respond. Several heuristics are used to determine this. For example, the parse that best matches the names in the database may be selected. However, these heuristics are often no better than guesswork.
Failures can occur in the prior art even after a single correct parse. In these cases, some or all of the elements in the parse tree cannot be matched to the names in the database. In these instances, the process fails and a database query cannot be developed to retrieve the desired information from the database. The system can still ask for clarification or rephrasing but it is very difficult in this case to tell the user how to change the query. Repeated, non-specific requests to rephrase the query can quickly discourage the user and cause the system to be rejected. To avoid this, some prior art guesses at the meaning of the natural language query. Guessing sometime permits information retrieval from the database, but the user has no way of telling if the information retrieved and presented is the correct system response or not. Guessing and presenting the wrong information can rapidly cause the user to lose faith in the system and stop using it.