Semantic data models allow relationships between resources to be modeled as facts. The facts are often represented as triples that have a subject, a predicate, and an object. For example, one triple may have the subject of “John Smith,” the predicate of “is-a,” and the object of “physician,” which may be represented as
<John Smith, ISA, physician>.This triple represents the fact that John Smith is a physician. Other triples may be
<John Smith, graduate of, University of Washington>representing the fact that John Smith graduated from the University of Washington and
<John Smith, degree, MD>representing the fact that John Smith has an MD degree. Semantic data models can be used to model the relationships between any type of resources such as web pages, people, companies, products, meetings, and so on. One semantic data model, referred to as the Resource Description Framework (“RDF”), has been developed by the World Wide Web Consortium (“W3C”) to model web resources, but it can be used to model any type of resource. The triples of a semantic data model may be stored in a semantic database that may include a fact table containing the triples representing the facts.
To search for facts of interest, a user may submit a query to a search engine and receive as results the facts that match the query. A query may be specified using SPARQL, which is a query language that has been developed for semantic databases that comply with the RDF format. The acronym “SPARQL” stands for “Simple Protocol and RDF Query Language.” A SPARQL query may include a “select” clause and a “where” clause as shown in the following example:
select ?professionwhere {   ?×  degree    ?profession}.The select clause includes the variable “?profession,” and the where clause includes the query triple with the variable “?x” as the subject, the non-variable “degree” as the predicate, and the variable “?profession” as the object. When a search engine executes this query, it identifies all triples of the database that match the non-variable(s) of the query triple. In this example, the search engine identifies all triples with a predicate of “degree” and returns the objects of those identified triples based on the variable “?profession” being in the select clause and in the object of the query triple of the where clause. For example, the search engine will return “MD” and “JD” when the database contains the following facts:
 <John Smith, degree, MD><Bill Greene, degree, JD>.If the select clause had also included the variable “?x,” then the search engine would have returned “John Smith, MD” and “Bill Greene, JD.”
SPARQL allows multiple query triples to be included in the where clause to create more queries such as the following example query:
select ?professionwhere?xdegree?profession(Example1)?xlivesinUSA?xcitizenofUSA?xis-aprofessor?professionis-alaw degree.This example query will return the various law degrees of professors who are U.S. citizens and who live in the United States, such as a B.S. in legal studies, a J.D., and an LL.M.
To identify the results for a query, a search engine identifies the triples that match each query triple. A triple matches a query triple when the triple matches each defined or non-variable element of the query triple. When a triple matches, its values are bound to the variables of the query triple. A search engine generates the results by taking intersections of the values bound to the variables of the query triples. In Example 1 above, because the where clause has five query triples, the search engine may identify five sets of triples. The first set will contain triples with the predicate “degree,” the second set will contain triples with the predicate of “livesin” and the object of “USA,” the third set will contain triples with the predicate of “citizenof” and the object of “USA,” the fourth set will contain triples with the predicate of “is-a” and the object of “professor,” and the fifth set will contain triples with the predicate of “is-a” and the object of “law degree.” After generating the sets, the search engine identifies the triples of the first set whose subject is also the subject of a triple in the second, third, and fourth sets and then returns those identified triples whose object is also the subject of a triple in the fifth set.
Current collections of facts can contain billions of triples. As a result, the process of identifying a set of triples that match a query triple can be computationally expensive and very time-consuming. When a query has multiple query triples, a search engine may need to make multiple passes through the entire collection (e.g., with each pass accessing each triple)—one for each query triple. Even after the sets are identified, the search engine still needs to identify the subset of triples that match all the query triples.