Semantic data models allow relationships between resources to be modeled as facts. The facts are often represented as triples that have a subject, a predicate, and an object. For example, one triple may have the subject of “John Smith,” the predicate of “ISA,” and the object of “physician,” which may be represented as
<John Smith, is-a, physician>.
This triple represents the fact that John Smith is a physician. Other triples may be
<John Smith, graduate of, University of Washington>
representing the fact that John Smith graduated from the University of Washington and
<John Smith, degree, MD>
representing the fact that John Smith has an MD degree. Semantic data models can be used to model the relationships between various types of resources such as web pages, people, companies, products, meetings, and so on. One semantic data model, referred to as the Resource Description Framework (“RDF”), has been developed by the World Wide Web Consortium (“W3C”) to model web resources, but it can be used to model any type of resource. The triples of a semantic data model may be stored in a semantic database.
To search for facts of interest, a user may submit a query to a search engine and receive as results the facts that match the query. A query may be specified using SPARQL, which is a query language that has been developed for semantic databases that comply with the RDF format. The acronym “SPARQL” stands for “Simple Protocol and RDF Query Language.” A SPARQL query may include a “select” clause and a “where” clause as shown in the following example:
select ?profession
where {?x degree ?profession}.
The select clause includes the variable “?profession,” and the where clause includes the query triple with the variable “?x” as the subject, the non-variable “degree” as the predicate, and the variable “?profession” as the object. When a search engine executes this query, it identifies all triples of the database that match the non-variable(s) of the query triple. In this example, the search engine identifies all triples with a predicate of “degree” and returns the objects of those identified triples based on the variable “?profession” being in the select clause and in the object of the query triple of the where clause. For example, the search engine will return “MD” and “JD” when the database contains the following facts:
<John Smith, degree, MD>
<Bill Greene, degree, JD>.
If the select clause had also included the variable “?x,” then the search engine would have returned “John Smith, M D” and “Bill Greene, J D.”
SPARQL allows multiple query triples to be included in the where clause to create more queries such as the following example query:
select?professionwhere{?xdegree?profession(Example 1)?xlivesinUSA?xcitizenofUSA?xis-aprofessor?professionis-alaw degree}This example query will return the various law degrees of professors who are U.S. citizens and who live in the United States, such as a B. S. in legal studies, a J. D., and an LL. M.
To identify the results for a query, a search engine identifies the triples that match each query triple. A triple matches a query triple when the triple matches each defined or non-variable element of the query triple. When a triple matches, its values are bound to the variables of the query triple. A search engine generates the results by taking intersections of the values bound to the variables of the query triples. In Example 1 above, because the where clause has five query triples, the search engine may identify five sets of triples. The first set will contain triples with the predicate “degree,” the second set will contain triples with the predicate of “livesin” and the object of “USA,” the third set will contain triples with the predicate of “citizenof” and the object of “USA,” the fourth set will contain triples with the predicate of “is-a” and the object of “professor,” and the fifth set will contain triples with the predicate of “is-a” and the object of “law degree.” After generating the sets, the search engine identifies the triples of the first set whose subject is also the subject of a triple in the second, third, and fourth sets and then returns those identified triples whose object is also the subject of a triple in the fifth set.
Current collections of triples can contain billions of triples. Because of the large size of the collections, indexing may be used to speed up the searching for triples that have certain values for their subject, predicate, or object. FIG. 1 is a block diagram illustrating data structures for indexing into a triple table. The data structures include a subject index 110 and an object index 120 for a triple table 130. In the example of FIG. 1, the triple table includes six triples with numeric identifiers of their subjects, predicates, and objects. For example, the first triple in the triple table is <1, 3, 2>, which may represent the triple <John Smith, is-a, physician>. The subject index maps subjects to the triples that contain those subjects and includes a subject values table 111 and a subject value-triple table 112. The subject values table includes an entry corresponding to each different subject of the triple table. For example, the second entry in the subject values table corresponds to the subject represented by the identifier of 2 (i.e., subject 2). Each entry includes a row and an optional count field. The row points to a row in the subject value-triple table. The count indicates the number of triples in the triple table with that subject and is optional because the count can be derived from the differences in the rows of the subject values table. The subject value-triple table contains a row for each triple of the triple table ordered by the identifiers of the subject. For example, rows 1 and 2 of the subject value-triple table point to triples 1 and 6 of the triple table for subject 1. The object index includes an object values table 121 and an object value-triple table 122. The object index maps objects to triples that contain those objects in a way that is similar to the way the subject index maps subjects to triples. Also, a predicate index may map predicates to triples that contain those predicates in a similar way
Although the indexing can speed up the locating of triples, the updating of the indexes to reflect newly added triples can be very time-consuming and require exclusive access to the data structures during the updating. For example, if a new triple of <1, 2, 3> is added to the triple table as triple 7, then rows 3-6 of the subject value-triple table may need to be shifted down one row each to make room for the row corresponding to the newly added triple. However, during this shifting and prior to updating the offsets of the subject values table, the subject index would be in an inconsistent state. For example, after shifting is complete but before the row values of the subject values table are updated, entry 5 of the subject values table corresponding to subject 5 would point to row 6 of the subject value-triple table, which would point to triple 5 of the triple table. Triple 5, however, contains subject 3, not subject 5. So a program that uses the subject index to retrieve the triples for subject 5 would mistakenly retrieve a triple with subject 3.
To avoid this inconsistency, a program updating the index acquires exclusive access to the data structures or at least the subject index to prevent another program from accessing the subject index while it is in an inconsistent state. It is, however, undesirable to prevent such other program from accessing the data structures for the relatively long time it may take to update the subject index.