The present invention relates to a database retrieval system, and more particularly to such a system having a natural language interface.
Business managers and staff require information to run their companies. Data processing departments of companies have been attempting to meet this information need since the early 1950's. The record keeping of most organizations is now computerized, and an abundance of data of all kinds, often describing transactions in minute detail, resides on the central computers of these organizations. In theory, all this data is available for review by employees of such companies. In practice, however, users of such information have faced serious obstacles in retrieving the information they need.
A frequent response to a user s request for data from a database is that the data is not stored in a way that enables it to be used to meet a user's need. Additionally, the complexity of current database systems requires a trained specialist to figure out how the data requested by a user can be retrieved from the database. This specialist must interpret the user's request or "query", determine exactly what it is the user is looking for, and figure out how to get that information from the database. Then, once the data is retrieved, it must be formatted into a report that the user can use and understand.
In recent years, a type of database known as a "relational database" has come into widespread use within the business community. An "entity-relationship" model is often used when mapping a real world system to a relational database management system. The entity-relationship model characterizes all elements of a system as either an entity (e.g., a person, place, or thing) or a relationship between entities. Both constructs are represented by the same structure, referred to as a "table".
A table is a collection of data organized into rows and columns, and represents a unit of a relational database. In an order-entry system, for example, entities will include parts and orders. Such information may be represented in two different tables. The relationship of which parts are requested by an order may be represented by a third table.
Thus, in applying the entity-relationship model, the entities of a system are identified and tables are constructed to represent entities. Then, relationships between the entities are identified and the current tables are extended (or new tables created) to represent these relationships. Finally, the attributes of each entity are identified and the tables are extended to include such attributes. Those skilled in the art are well familiar with the application of the entity-relationship model to relational database management systems.
In recent years, there have been proposals for providing a natural language interface to relational databases. An English language interface, for example, would enable unskilled users of a database to query the database for desired information, and receive such information without the need to rely on a trained specialist to interpret the query, access the database, generate a report, and communicate the report to the end user. Thus, a natural language interface would save enormous time and money for companies using relational databases, and would enable users with little or no computer experience to use a sophisticated database system by merely inputting (e.g., via a keyboard) a natural language (e.g., English) question.
An example of a natural language interface proposed in the past can be found in the article entitled Natural Language Interfaces: Benefits. Requirements, State of the Art and Applications, by John L. Manferdelli, A.I. East, October, 1987. This article describes a system in which an English sentence is converted into a grammatical structure ("parsed"), much like a sentence diagram. The diagrammed sentence is then translated into a "representation language" that is a hybrid of a semantic network and first order predicate logic. The representation represents time dependent facts, quantified statements, tense information and general sets, and is based on concepts contained in the original English sentence.
The representation language provided by the prior art system referenced above is complex, and not easily understandable even to a skilled user of the system. Thus, it is difficult for such a system to be implemented as a general purpose interface for any application database that might be desired. Customization of the interface to specific application databases was difficult and time consuming, and no means were provided for enabling a skilled user to easily comprehend the representation language produced by the natural language interface for a given query. Without such means, the building and testing of an interface for a particular application is extremely difficult and costly.
Various other articles have been published concerning software that is currently available to enable a natural language, such as English, to be translated into a representation language that can be used by a computer system to respond to a natural language query. For example, a program known as "McELI" is available for this purpose and discussed in Inside Computer Understanding, Schank and Riesbeck, Erlbaum Press, 1981. Another program known as "LIFER" is described in the article LIFER: A Natural Language Interface Facility, by Gary G. Hendrix, SIGART Newsletter. Issue 61, 1977, pp. 25-26. Each of these programs will translate a natural language into another formal syntax, such as a representation language. However, to date the representation language syntaxes have been complicated and difficult to understand. Therefore, no means have been available to enable anyone but the most sophisticated computer programmers to utilize such languages in providing a natural language interface capability to desired applications, such as the retrieval of information from a database.
A particular problem in providing a natural language interface for a database resides in enabling the system to locate data responsive to a natural language query regardless of the words used in the original query. A primary objection of end users of most prior art database retrieval systems is that they have to learn the names of the database elements, i.e., if the term "salary" is used in the database, the end user would have to use the same term in order to retrieve salary information, and could not use synonyms such as "wage", "earns", "makes", or "pay". This problem is referred to as the "synonym problem".
Some products have attempted to solve this problem by having the system programmers define all of the synonyms that can be thought of for each database element, and to program these synonyms into the system. Such a requirement makes the setup procedure of a natural language interface extremely cumbersome, and often impractical.
Another problem with providing a natural language interface for database retrieval stems from the fact that the end user does not know where desired information resides in the database. For example, some information would have to be retrieved from detail-level columns in the database, whereas other data would have to come from summary-level columns. The choice of which column(s) to use must be made by the system, since the end user is unable to specify the data location. This problem is referred to as the "data location problem".
The assignee of the present application has marketed a product in the past which attempted to resolve the data location and synonym problems. That product included a built-in database expert system containing rules to resolve a many-to-one relationship between words/phrases and concepts, and also to resolve one-to-many relationships between concepts represented in a natural language query and database columns. For example, words such as sales, sell, bought, purchases, and revenues contained in a query would be mapped to a concept known as "sales". Then, the concept "sales" would be mapped to the various columns of a specific database containing sales information. The specific product involved was a turnkey wholesale distribution application that provided a natural language interface to a specific database. The natural language interface was custom designed for the specific database, and was not database independent. The system did not provide means to enable a skilled user thereof to tailor the interface for any other database. The representation language provided by the natural language interface was not easily understandable to a skilled user. Thus, it will be appreciated that the prior system was not a general purpose database retrieval system.
It would be advantageous to provide a truly general purpose natural language interface for database retrieval, allowing skilled users (who are not experts in artificial intelligence computer theory and application) to easily custom tailor the interface to a specific application database. Such a system should solve both the data location problem and the synonym problem inherent in prior art natural language interfaces.
It would be further advantageous for such a system to generate a representation language, or "meaning representation" that is easily understandable, database independent, and canonical (i.e., two different queries having the same meaning must have the same final meaning representation, and two queries having different meanings must have different final meaning representations). Such a meaning representation should capture, at a conceptual level, the information requirement expressed in the natural language query.
It would be further advantageous to provide such a system in which a skilled user or "developer" builds a knowledge base, pertaining specifically to an application database, that enables the system to efficiently and economically retrieve and report data that is a proper response to a natural language query entered by an unskilled user. Such a system should interpret the query, use the knowledge captured in its database expert system to locate the relevant data tables and columns from a database, and then transparently generate the most efficient code (e.g., structured query language--"SQL") to produce a report instantly. No knowledge of SQL, database field names, or other technical jargon should be required of the end user.
The present invention provides such a database retrieval system and method for retrieving data from a database.