1. Field of the Invention
The invention concerns the representation of semantic knowledge by the Resource Description Framework, or RDF, and more specifically concerns the integration of data represented by RDF into a relational database system.
2. Description of Related Art: FIGS. 1-3
RDF is a language that was originally developed for representing information (metadata) about resources in the World Wide Web. It may, however, be used for representing information about absolutely anything. When information has been specified using the generic RDF format, it may be automatically consumed by a diverse set of applications.
FIGS. 1-3 provide an overview of RDF. Facts in RDF are represented by RDF triples. Each RDF triple represents a fact and is made up of three parts, a subject, a predicate, (sometimes termed a property), and an object. For example, the fact represented by the English sentence “John is 24 years old” is represented in RDF by the subject, predicate, object triple <‘John’, ‘age’, ‘24’>, with ‘John’ being the subject, ‘age’ being the predicate, and ‘24’ being the object. In current RDF, the values of subjects and predicates must ultimately resolve to universal resource identifiers (URIs). The values of objects may be literal values such as numbers or character strings. The interpretations given to the members of the triple are determined by the application that is consuming it.
RDF triples may be represented as a graph as shown at 109 in FIG. 1. The subject is represented by a node 103, the object by another node 107, and the predicate by arrow 104 connecting the subject node to the object node. A subject may of course be related to more than one object, as shown with regard to “Person” 103. Each entity in an RDF triple is represented by a World Wide Web Uniform Resource Identifier (URI) or a literal value. For example, the subject “John” is identified by the URI for his contact information. In RDF triple 117, the value of John's age is the literal value 24. In the following general discussion of RDF, the URIs will be replaced by the names of the entities they represent. For a complete description of RDF, see Frank Manola and Eric Miller, RDF Primer, published by W3C and available in September, 2004 at www.w3.org/TR/rdf-primer/. The RDF Primer is hereby incorporated by reference into the present patent application.
An RDF representation of a set of facts is termed in the following an RDF model. A simple RDF model Reviewers is shown at 101 in FIG. 1. The model has two parts: RDF data 113 and RDF schema 111. RDF schema 111 is made up of RDF triples that provide the definitions needed to interpret the triples of RDF data 113. Schema triples define classes of entities and predicates which relate classes of entities. A property definition for the predicate age is shown at 112. As shown there, a predicate definition consists of two RDF triples for which the predicate is the subject. One of the triples, which has the built-in domain predicate, indicates what kind of entities must be subjects for the predicate. Here, it is entities belonging to the class person. The other triple indicates what kinds of entities must be objects of the predicate; here, it is values of an integer type called xsd:int. Schema 111 uses the SubclassOf predicate 110 to define a number of subclasses of entities belonging to the class person. Also defined are conference and university classes of entities, together with predicates that relate these entities to each other. Thus, an entity of class person may be a chairperson of a conference and an entity of class reviewer may be a reviewer for a conference. Also belonging to Schema 111 but not shown there is the built-in RDF predicate rdf:type. This predicate defines the subject of a triple that includes the rdf:type predicate as an instance of the class indicated by the object. As will be explained in more detail, RDF rules determine logical relationships between classes. For example, a built-in RDF rule states that the SubclassOf relationship is transitive: if A is a subclass of B and B a subclass of C, then A is a subclass of C. Thus, the class faculty is a subclass of person.
The data triples to which schema 111 applies are shown at 113; they have the general pattern <individual entity>, <predicate>, <object characterizing the individual entity>. Thus, triple 115 indicates that ICDE 2005 is an entity characterized as belonging to the class CONFERENCE and triple 117 indicates that JOHN is characterized by having the age 24. Thus, RDF data 113 contains the following triples about John:                John has an Age of 24;        John belongs to the subclass Ph.D. Student;        John is a ReviewerOf ICDE 2005.        
None of these triples states that John is a Person; however, the fact that he is a Person and a Reviewer is inferred from the fact that he is stated to be a Ph.D. Student, which is defined in schema 111 as a subclass of both Person and Reviewer. Because the Subclassof predicate is transitive, the fact that John is a Ph.D Student means that he is a potential subject of the Age and ReviewerOf properties.
For purposes of the present discussion RDF models are best represented as lists of RDF triples instead of graphs. FIG. 2 shows a table of triples 201 which lists triples making up schema 111 and a table of triples 203 which lists triples making up RDF data 113. At the bottom of FIG. 2 is an RDF Pattern 205. An RDF pattern is a construct which is used to query RDF triples. There are many different ways of expressing RDF patterns; what follows is a typical example. When RDF pattern 205 is applied to RDF model 101, it will return a subgraph of RDF model 101 which includes all of the reviewers of conference papers who are Ph.D students. The pattern is made up of one or more patterns 207 for RDF triples followed by an optional filter which further restricts the RDF triples identified by the pattern. The identifiers beginning with ? are variables that represent values in the triples belonging to the subgraph specified by the RDF pattern. Thus, the first pattern 207(1) specifies every Reviewer for every Conference indicated in the RDF data 203; the second pattern 207(2) specifies every Reviewer who belongs to the subclass Ph.D. Student, and the third pattern 207(3) specifies every Person for which an Age is specified. The result of the application of these three patterns to RDF data 203 is the intersection of the sets of persons specified by each of the patterns, that is, the intersection of the set of reviewers and the set of Ph.D. Students of any age. The intersection is John, Tom, Gary, and Bob, who are indicated by the triples in data 203 as being both Ph.D students and reviewers.
The manner in which entities in an RDF model relate to each other can be modified by applying RDF rules. An example RDF rule is shown at 301 in FIG. 3. Rule 301 is contained in a rulebase which, as shown at 303, has the name rb. The rule has a name, chairpersonRule, which is shown at 305. As will be explained in detail later, the rule specifies how the class of Persons who are conference chairpersons relates to the class of Reviewers for the conference. Rule body 310 has a left-hand side 307 specifying the rule's antecedent and a right-hand side 311 specifying the rule's consequent. The rule states that if an entity satisfies the conditions established for the left-hand side 307 (the antecedent), it also satisfies the conditions established for the right-hand side 311 (the consequent). The antecedent and the consequent are specified by RDF patterns. The RDF pattern for left-hand side 307 specifies any Person (?r) in the model who is a chairperson of any Conference (?c) in the model; the RDF pattern for right-hand side 311 specifies that any such person is also a reviewer for that conference.
RDF pattern 312 shows the effect of rule 301. The pattern's triple specifies RDF triples which have the ReviewerOf predicate. Without rule 301, the pattern returns the subjects of those triples for ?r, or John, Tom, Gary, and Bob. The problem with this is that Mary is also a reviewer by virtue of rule 301; consequently, when the rule is taken into account, the triples include not only those with the ReviewerOf predicate, but those that have the ChairpersonOf predicate, and that adds Mary to the list of subjects for ?r. An RDF model 101 and the rules and other information required to interpret the model are termed together in the following an RDF dataset Components of an RDF data set are shown at 313 in FIG. 3. The components include RDF model 101, with its schema 111 and RDF data 113, one or more optional rulebases containing rules relevant to the model, and a list of optional aliases 323, which relate names used in the model to longer designations.
The rulebases include an RDFS rulebase 319 which is a set of rules which apply to all RDF models. An example of the rules in this rulebase is the rule that states that an entity which belongs to a subclasss of a class also belongs to the class, for example, that as a member of the class Ph.D. Student, John is also a member of the class Person. In addition, rules may be defined for a particular RDF model. Rule 301 is an example of such a rule. These rules are contained in one or more other rule bases 321. Aliases 323 relates short names used in a model to the URIs that completely identify the short name. For example, John, Mary, Tom, Gary, and Bob are all subjects and must therefore be identified by URIs. Aliases 323 will include a table that relates each name to its corresponding URI.
Systems for Querying RDF Models
A number of query languages have been developed for querying RDF models. Among them are:                RDQL, see RDQL—A Query Language for RDF, W3C Member Submission Jan. 9, 2004, http://www.w3.org/Submission/2004/SUBM-RDQL-20040109;        RDFQL, see RDFQL Database Command Reference, http://www.intellidimension.com/default.rsp?topic=/pages/rdfgateway/reference/db/default.rsp;        RQL, see G. Karvounarakis, S. Alexaki, V. Christophides, D. Plexousakis, M. Scholl. RQL: A Declarative Query Language for RDF. WWW2002, May 7-11, 2002, Honolulu, Hi., USA.        SPARQL, see SPARQL Query Language for RDF, W3C Working Draft, Oct. 12, 2004, http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/.        SquishQL, see RDF Primer. W3C Recommendation, Feb. 10, 2004, http://www.w3.org/TR/rdf-primer.        
The query languages described in the above references are declarative query languages with quite a few similarities to SQL, which is the query language used in standard relational database management systems. Indeed, systems using these query languages are typically implemented on top of relational database systems. However, because these systems are not standard relational database systems, they cannot take advantage of the decades of engineering that have been invested in the standard relational database systems. Examples of the fruits of this engineering that are available in standard relational database systems are automatic optimization, facilities for the creation and automatic maintenance of materialized views and of indexes, and the automatic use of available materialized views and indexes by the optimizer. What is needed if RDF triples are to reach their full potential are a technique for using RDF patterns to query sets of RDF triples that may be employed in a standard relational data base management system and techniques for using the facilities of the relational database management system to reduce the cost in processing time of queries on sets of RDF triples. Providing such techniques is an object of the present invention.