1.1 Field of the Invention
The invention relates to data analysis generally and more specifically to data analysis performed using knowledge base systems.
1.2 Description of the Prior Art
In the computer age, information is stored primarily in data base management systems. FIG. 1 is a schematic block diagram of a data base management system (DBMS) 101. System 101 is implemented using storage devices such as disk drives to store the information and processors coupled to the disk drives to access the data. In system 101, a query 103, which describes the information to be located, is presented to DBMS 101, which processes the query in query manager 107, locates the information in data base 117, and returns it as data 105. Query 103 describes the information to be located by using names. For example, a query in the SQL query language has the following general form:
select &lt;field names&gt; PA1 from &lt;table names&gt; PA1 where &lt;constraints that rows must satisfy&gt; PA1 Forms which the user fills out interactively. The queries are generated from the forms. PA1 Redefinition of the names used in schema 113 in terms of concepts familiar to the user of the system. PA1 Natural language interfaces to data base management system 101. PA1 one or more data base management systems for receiving first queries and returning data in response thereto; PA1 a knowledge base management system for organizing the data in a knowledge base according to a set of concepts and operating on the data in response to expressions stated in a description language which employs the concepts; PA1 means for receiving the expressions, translating the expressions into the first queries, receiving the data, and returning the data together with the expressions to the knowledge base management system for incorporation into the knowledge base; and PA1 means for receiving second queries specifying certain of the data and responding thereto by translating the second queries into expressions specifying the certain data, providing the expressions to the knowledge base management system, receiving the certain data from the knowledge base management system, and providing the certain data. PA1 A knowledge base wherein the body of information is represented by individuals and concepts which organize the individuals; PA1 means coupled to the knowledge base for responding to a query specifying a collection of the individuals by making a collection specification which specifies the same collection of individuals and has a form compatible with the concepts; and PA1 means coupled to the knowledge base for receiving the collection specification and integrating the collection specification into the concepts. PA1 A knowledge base wherein the body of information is represented by individuals and concepts which organize the individuals; PA1 means for making an alteration with regard to one or more of the individuals; PA1 means responsive to the alteration for making a reorganization of the individuals as required by the alteration and the concepts; and PA1 means responsive to the reorganization for indicating an effect of the reorganization with regard to one or more of the individuals.
Of course, the information in data base 117 is not located by names, but rather by means of addresses in whatever storage device data base 117 is implemented on. The relationship between the names used in the queries 103 and the addresses used in data base 117 is established by schema 113, which defines the names used in the queries in terms of the locations in data base 117 which contain the data referred to by the names.
Operation of data base management system 101 is as follows: Query 103 is received by query manager 107, which parses it query manager 107 presents the names 109 in query 103 to schema 113, which returns descriptors 111 describing the data represented by the names in data base 117. Query manager 107 then uses the descriptors and the query 103 to produce a stream of operations 112 which cause data base 117 to return the data 105 specified by query 103. Query manager 107 then returns the data 105 to the user who produced the query.
Data base management systems 101 are effective for storing and retrieving data; they do, however have a number of problems. One of the problems is complexity; query languages such as SQL are not simple. Further, schema 113 in a large data base management system 101 is also complex. Effective formulation of queries 103 requires detailed understanding not only of the query language used in system 101 but also of the meanings of the names used in schema 113. For this reason, formulation of queries for system 101 is often left to specialists. The overhead involved here is considerable in any case and grows if different data base management systems 101 with different query languages are involved. Attempts to overcome the complexity of query writing have included techniques such as the following:
A modern example of such techniques is BusinessObjects, in which an SQL expert relates forms employing terms with which the user is familiar to queries in the SQL query language. By filling out the forms, the user can generate SQL queries without knowing the SQL query language. While the above techniques are worthwhile, none of them is able to deal with situations in which the information of interest is contained in more than one kind of data base management system 101.
Another problem with data base management system 101 is the relative inflexibility of its organization. Changes to schema 113 may be made only by specialists intimately familiar with schema 113 and its relationship to data base 117. Indeed, in many systems 101, schema 113 is produced by compilation, and consequently, a change to schema 113 requires recompiling the entire data base management system 101. The inflexibility of the organization causes problems both for data base management system 101's design and for its later use. Because of the inflexibility of the organization, it is difficult and expensive to design schema 113 for a data base management system 101. In particular, it is difficult to use the technique of producing a prototype and experimenting with it to determine the best form for the final system. Because of the inflexibility of the organization, it is also difficult to access the data in data base 117 in ways unenvisioned in the original design of schema 113. This problem has become more important as the information in large data base management systems 101 has been used not only for its originally-intended purposes, but also as a resource for various kinds of research. Since the schema of the data base management system was set up for the original purpose, it is difficult to fashion queries which look at the information in the manner required for the research.
The above and other problems of data base management systems 101 may be solved by employing knowledge base management systems in conjunction with data base management systems. In the present context, the chief distinction between a knowledge base management system and a data base management system is this: in a data base management system, the designer of schema 113 uses his or her conceptual knowledge of the data in data base 117 to design schema 113; however, schema 113 and the query language do not reflect the conceptual knowledge. For example, in systems using SQL, queries specify data by specifying tables and rows and columns in the tables. In a knowledge base management system on the other hand, both the equivalent to the schema and the language used to describe data reflect the conceptual knowledge. U.S. patent application Ser. No. 07/781,464, Borgida et al., Information Access Apparatus and Methods, filed Oct. 23, 1991, and assigned to the assignees of the present patent application, describes generally how a knowledge base management system may be used in conjunction with a data base system; the present patent application presents more detail concerning the uses and advantages of integrating knowledge base management systems with data base management systems.