1. Field of the Invention
The invention relates to information systems in which data is stored and accessed by means of query language statements. Examples of such information systems include conventional database management systems of various sorts, including relational databases, "object-oriented" databases, computer file systems in which data is stored and retrieved, and artificial intelligence systems with explicitly stored knowledge bases that hold information for use by a human user, expert system, or other artificial intelligence algorithms.
2. Description of the Prior Art
Information systems that store large amounts of patterned data are widespread in virtually all areas of business. These range from simple file systems to complex database management systems that store data as "records," usually in secondary memory, such as magnetic tape, magnetic disks, or optical disks. A general introduction to modern database management systems can be found in Ramez Elmasri and Shamkant B. Navathe, Fundamentals of Database Systems. Redwood City, Calif.: The Benjamin Cummings Publishing Company, Inc., 1989. Such systems usually have--at least implicitly--a fixed pattern for each type of entry. Records, for example, would have fixed sets of fields, and each individual record would have the same set of fields as all others in its category. Each field in a record may contain particular bits representing data. The data is usually either numbers or strings. Standard data management systems require a relatively small set of record formats to be specified in advance; these together are called the "schema." In general, the schema, once chosen, cannot be easily modified.
Another important aspect of information systems of the above sort is that there is usually some way to retrieve data out of the information system. While some are crude or stylized or strictly form-based, and some are very complex formal languages, we refer to instances of the set of querying and retrieval mechanisms as "query languages". Users (or sometimes computer programs) form queries in these query languages, "evaluate" them on the information base, and have returned to them answers to their queries. We refer to the form of these answers as "tables", since that is representative of a large class of data management systems (e.g., relational database management systems), although not every information system returns its answer in the literal form of tables.
A typical interaction with information systems has a human user constructing a query in the query language, evaluating it against the information base, and having a table or set of tables returned in textual form on a screen, on a hard copy, or in a computer file. The user may look at the tables and then construct a new query that may possibly incorporate parts or all of the prior query (or several prior queries). Database management systems have been constructed to optimize certain types of retrievals, and relational databases have generally been designed to respond to queries in a language called "SQL", which has become a de facto standard in the industry. While providing basic query-answering competence, this language has certain important limitations in the way in which it allows the user to conceptualize the information in the database. It forces the user to interact with the data in a very rigid pattern (see below).
Another problem with most conventional information systems is that they do not store queries in a conceptual ("intensional," as opposed to "extensional") form, so that they can be compared, explored, or reused without complete reevaluation. Once a query on a very large database has been evaluated, it could be very convenient and time-saving to save the results of the query and the query itself in a form that can be reused without computing it again. Even the notion of "views", which are a mechanisms that allow a user to conceptualize a database in some other form than that given in the schema, is restrictive. Views must themselves be in the same strict tabular form as standard database tables, and the operations which may be performed on them are limited:
Views cannot be compared to one another; PA1 The only way in which an inference can be done on a view is by doing it on all rows of the underlying tables; PA1 Views cannot be directly updated. Instead, any new tuples for a view must be inserted into the tables upon which the view is based; PA1 Views (and the relational algebra on which they are based) do a poor job of handling partial or incomplete information. PA1 means for making a class description that defines a class to which one or more of the entities potentially belong; PA1 means for translating the class description into one or more statements in the query language which locate entities belonging to the class defined by the class description; and PA1 means for employing these statements to locate in the data base the entities belonging to the class.
Further, while adding homogeneous information to a standard database is easy, it is difficult or impossible to add new heterogeneous pieces of information (i.e., descriptions of objects that are exceptional or unorthodox in some way).
There have been recent attempts to solve the above problems of data bases by integrating the knowledge bases employed by some types of artificial intelligence with data bases. For a general introduction to knowledge bases and their structure, see Ronald J. Brachman, "The Basics of Knowledge Representation and Reasoning", AT&T Technical Journal, Vol. 67, No. 1, pp. 7-24. Prior attempts to use knowledge-base processing systems as interfaces to information management systems include IntelliCorp's KEEConnection system ("Bridging the Information Gap", in A Review of Products, Services, and Research, AAAI-87), which allows a user to bring data from a database into a KEE knowledge base. SDM (M. M. Hammer and D. J. McLeod, "Database Description with SDM: A Semantic Database ModeI", ACM Transactions on Database Systems 6, No. 3, Sept. 1981) uses a hierarchical "semantic data model" to allow more object-oriented viewing of a relational database. Neither of these systems uses a formal compositional description language, which is central to the success of the present invention, nor do they perform the classification inference that allows the present invention to generate correct queries automatically for composite descriptions. "Natural language access to database" systems attempt to allow the user to express queries in natural languages like English. These systems further do not allow the results of queries to be automatically stored and organized in the knowledge base. The CODE-BASE system, described in Peter G. Selfridge, "Knowledge Representation Support for a Software Information System", Proceedings of the Seventh IEEE Conference on AI Applications, Miami Beach, Fla., February, 1991, pp. 134-140, is based on the same description logic as the present invention's preferred embodiment, but does not take advantage of the classification inference to allow automatic generation of queries from composite descriptions. In sum, all prior systems that attempt to connect databases and knowledge base processing systems either rely on the user exclusively to form all queries (or mappings to the database for all forms in the knowledge base) by hand, or do not take advantage of a compositional description language to allow the storing, organizing, and automatic generation of database queries. It is thus an object of the apparatus and methods disclosed herein is to overcome these problems of the prior systems.