This invention relates generally to information management systems. More particularly, it relates to a frame-based knowledge representation system built using a relational database.
One of the growing problems facing scientific researchers is how to integrate and process the enormous amount of data being produced daily. While a great deal of data is available on the World Wide Web, simply having access to the data is useless without robust methods for searching, organizing, and analyzing the data. Various data models have been developed for storing information that can be categorized using ontologies. An ontology is a system that specifies the classes and relations among classes within a domain of discourse. As ontologies become more complex, existing tools for representing data are no longer able to sufficiently represent the data.
Relational database management systems (RDBMS) are by far the most dependable and widely used architectures for building large databases. They contain a few tables of data in which one or a few dependent values are associated with a set of useful independent features that can be searched quickly. An example of a RDBMS table is shown in FIG. 1. By searching on the names, address and telephone numbers for each person can be retrieved quickly. In general, relational databases are most effective to use and easy to maintain when there are a limited number of tables of information, linked together logically, and with a very large number of records in each table. They are also an ideal solution when the structure of the data model is very well understood, not subject to change, and in routine use. Changes to queries and data fields are difficult to implement without completely taking the system out of use and restructuring the model. For example, adding an email address for each person in the database containing the table of FIG. 1 requires redesign of the table structure (either a new column or table) and existing queries. Furthermore, for many data sets, the structure is too complex to be represented effectively by a relational database. Straightforward relational representations can leave out important dependencies of interest, and effectively fit the data to the capabilities of the database structure, instead of fitting the structure to the data. When the data model becomes a large network of interacting tables, queries are also much more difficult to write.
More flexible data structures, known as knowledge bases, have been developed to more closely model the entities in the system of interest and the interactions among them. The key distinction between a knowledge base and a relational database is the manner of organization of the data. In a relational database, data is organized into tables that are accessed by specifying rows and columns of the tablexe2x80x94the tables do not reflect conceptual knowledge of the data. In contrast, design and organization of knowledge bases requires conceptual knowledge and representation of the data. In a knowledge representation system, all of the concepts in the domain of discourse are organized into a hierarchical tree of classes, with instances of classes located at the leaves of each branch. Further, the attributes associated with instances are stored with the instance, and not distributed throughout all of the tables of a relational data model.
The two primary innovations in knowledge bases have been object oriented databases and frame-based representation systems. Object oriented approaches allow more modular modeling than relational databases. Each piece of data in the system is considered an object, and the properties of an object are stored locally with the object, along with pointers to related objects. Complex data models are easier to implement, and hierarchies of objects can be created to help organize the large amounts of information. These systems typically provide benefits over relational database systems in the richness of available queries over more complex data types. However, object oriented databases have significant drawbacks that have prevented their acquiring a broad base of established users. They require not only that a researcher specify the properties of entities, but also that they map them onto programming language and database structures. Users interested in the stored information often have neither time for nor interest in learning about the underlying database structure. In addition, object oriented databases suffer from the lack of a universally agreed upon query language.
Frame-based representation systems can be considered object oriented architectures that provide built-in support for dynamic and hierarchical data modeling, for distinguishing between general concepts and particular instances of these concepts, for associating particular attributes with each concept, for inheriting attribute values from parent concepts, and for linking concepts with named relationships. They allow modification of the data model without the need to rebuild the structure, and have a common communication protocol for reading to and writing from the knowledge bases. Developers have created several frame-based knowledge representation tools, including Ontolingua (A. Farquhar, R. Fikes, and J. Rice, xe2x80x9cThe Ontolingua Server: A Tool for Collaborative Ontology Construction,xe2x80x9d Tech. Report KSL-96-26, Knowledge Systems Laboratory, Stanford University, Stanford, Calif., 1996); Protxc3xa9gxc3xa9 (M. A. Musen et al., xe2x80x9cProtxc3xa9gxc3xa9-II: An Environment for Reusable Problem-Solving Methods and Domain Ontologies,xe2x80x9d Proc. IJCAI""93 1993 Int""l Joint Conf. Artificial Intelligence, Morgan Kaufmann, San Francisco, 1993); and Theo (T. Mitchell et al., xe2x80x9cTHEO: A Framework for Self-Improving Systems,xe2x80x9d Architectures for Intelligence, K. Van Lehn, ed., Lawrence Erlbaum, Hillsdale, N.J., 1989). Such tools have an array of features, default reasoning strategies, and knowledge-representation constraints. However, some require users to install special software, and others lack important features, such as a persistent back-end storage system for scalability, facilities for controlling access based on user permissions, an API for prototype development, or easy compatibility with Web protocols.
Several existing frame-based systems map a knowledge model into a relational database, thereby solving the above-described problem of requiring specialized software to implement knowledge representation systems. These systems instead allow developers to use well-known and widely available databases along with their existing tools, providing systems that that are easy to use and access through a variety of interfaces. For example, the PERK database back-end to the GKB-Editor (P. D. Karp, K. L. Myers, and T. Gruber, xe2x80x9cThe Generic Frame Protocol,xe2x80x9d Proc. IJCAI-95. 1995 Int""l. Joint Conf. Artificial Intelligence, Morgan Kaufmann, San Francisco, 1995, pp. 768-774) and EcoCyc frame based knowledge-base tool (P. D. Karp et al., xe2x80x9cEcoCyc: Electronic Encyclopedia of Escherichia Coli Genes and Metabolism,xe2x80x9d Nucleic Acids Research, 27(1), pp. 55-58, 1999) both use a relational database for storage. The PERK storage system is discussed in detail in P. D. Karp, V. K. Chaudhri, and S. M. Paley, xe2x80x9cA Collaborative Environment for Authoring Large Knowledge Bases, 1997. In the PERK system, individual frames (objects and associated attributes) are stored in a RDBMS as compressed ASCII text and are unpacked into memory on demand. Frames must be loaded from the flat file in order to be queried, leading to a start-up delay and limits on scalability. More importantly, the client machine accessing the information stored on PERK must have special software to unwrap the objects and put them in temporary storage.
A knowledge representation model built on a relational database is disclosed in P. M. Nadkarni, xe2x80x9cQAV: querying entity-attribute-value metadata in a biomedical database,xe2x80x9d Computer Methods and Programs in Biomedicine, 53, pp. 93-103, 1997. However, the database structure cannot fully support a frame-based knowledge representation system, which contains a hierarchy of classes and particular instances of classes, because it does not necessarily include the key relation of xe2x80x9cinstance of.xe2x80x9d Furthermore, users querying the database must have explicit knowledge of the underlying structure, and applications that interact with the data must be designed specifically for the particular database system used.
There is still a need for a knowledge base data storage system that can serve large amounts of data to a variety of client interfaces and that uses standard relational database tools and standard knowledge-base protocols.
Accordingly, it is a primary object of the present invention to provide a frame-based knowledge representation system built on a commercial relational database management system that is transparent to a user. Benefits associated with relational databases, including good performance, data coherency, concurrent users, and automatic backup, are therefore also provided.
It is a further object of the invention to provide a knowledge base that requires only conventional relational database management software, and does not require specialized software. The knowledge base is therefore easy to use and develop without requiring specialized skills.
It is another object of the present invention to provide a knowledge base that is compatible with the current standard knowledge base query protocol, Open Knowledge Base Connectivity, thereby making the underlying database structure transparent to users and developers.
It is an additional object of the invention to provide a knowledge base system that is highly scalable to large data sets.
It is a further object of the invention to provide a knowledge base that is both flexible and highly structured, allowing for easy modification of the structure as the data model develops, and also for complicated hierarchies of data.
It is another object of the present invention to provide a frame-based representation system that allows for data ownership and access privileges to be specified for each piece of data.
It is an additional object of the invention to provide a knowledge base that may be accessed through a Web-accessible browser or through Application Programming Interfaces, allowing for universal accessibility of data and facilitating creation of new interfaces, while maintaining transparency of the underlying data structure.
These objects and advantages are attained by a frame-based representation system built on a relational database that is hidden from the user. While the data model is consistent with other frame-based systems, it uses a simple and novel data structure to organize and store the ontology and instances. Standard frame-based queries for retrieving specific portions of the stored data are implemented in a novel manner that is consistent with the underlying data structure.
The present invention provides a computer-readable medium encoded with a relational database for storing a frame knowledge system that has classes, relations, and instances of the classes. The database includes a frames table that has columns for storing frames, including class frames representing classes and instance frames representing instances, at least one slot associated with each of the frames, and a value associated with each of the slots. The slots represent relations, and include the relation known as instance-of, which describes the relation between an instance of a class and the class. Preferably, the frames table also has columns for storing an access permission, an ownership, and at least one facet associated with each slot.
Preferably, the database also includes a classes table that has columns for storing the class frames with associated class slots and class values, all of which are also stored in the frames table. However, the classes table also includes a slot typexe2x80x94own or templatexe2x80x94associated with each slot, and distinguishes between own slots, which characterize the class, and template slots, which characterize instances of the class. Each template slot associated with a particular class frame in the classes table is also stored in the frames table, where it is associated with a corresponding instance frame of the particular class. The database may also include a class hierarchy table that has columns for storing class frames and at least one superclass associated with each class frame.
The present invention also provides a method for querying a frame-based representation system that contains a set of relational database tables for storing frames and associated attributes. The method occurs in a server in a distributed computer system, and includes the following steps: receiving a query in a first format for a subset of frames from a client computer; translating the query into a second format for querying the tables; applying the query in the second format to the set of tables to select the subset, and transmitting output including the subset to the client computer. The first format is preferably a knowledge base format, most preferably OKBC, and the second format is preferably a relational database format, most preferably SQL. The query in the second format includes a predetermined attribute related to the query in the first format, and each frame in the retrieved subset is associated with the predetermined attribute. The query in the second format may be applied to one, some, or all of the tables. Preferably, the set of relational database tables includes the tables described above. In that case, the predetermined attribute includes a predetermined slot and predetermined value. The method preferably includes a step of processing the subset to generate formatted output. The processing step may include comparing an access permission of each retrieved frame, stored in the frames table, with a client identifier for the client computer. Based on the comparison, it is determined whether the client may access the frames in the subset. The database may be stored on the server or on a database computer that is distinct from the server.
The present invention also provides a method for adding data to a frame-based representation system that contains a set of relational database tables for storing frames and associated attributes. The method occurs in a server in a distributed computer system, and includes the following steps: receiving a query in a first format to create a new frame from a client computer; translating the query into a second format for querying the tables; and applying the query in the second format to the set of tables to create a new record. The first format is preferably a knowledge base format, most preferably OKBC, and the second format is preferably a relational database format, most preferably SQL. The second format includes parameters related to the query in the first format, and the new record represents the new frame and contains the parameters. Preferably, the tables are the tables described above, and the parameters include a predetermined slot and value. Preferably, the new record also contains a client identifier associated with the client computer.