The resource description framework (RDF) is powerful technology that is used extensively in combination with SPARQL queries. RDF defines a data model that codifies statements about entities in the form of subject-predicate-object expressions, known as triples. The subject denotes the entity itself, and the object denotes traits or aspects of the entity. The predicate of the triple expresses a relationship between the subject and the object. For example, the statement “exercise promotes health” can be expressed as a triple where “exercise” is the subject, “promotes” is the predicate, and “health” is the object. Any relationship between any subject and any object can be expressed as an RDF triple. The ability for RDF triples to model complex concepts involving any arbitrary relationships between subjects, predicates and objects has led to increased use of RDF triples in database settings.
The term SPARQL refers to a protocol using RDF that is designed as a language for querying and manipulating RDF data. SPARQL became an official World Wide Web Consortium (W3C) recommendation in 2008. Database engines that process queries written in SPARQL are able to retrieve and manipulate data stored in the resource description framework format. SPARQL includes constructions for specifying a query that includes “triple” patterns that are processed against conjunctions, disjunctions, etc.
Often, when RDF data is stored or represented in a database or other type of data store, the RDF data is stored as normalized data tables. Normalizing data in database tables seeks to remove data redundancy, and hence to promote ease of maintenance, data integrity, data consistency, and data storage space savings. As one example, while the semantics of the phrase, “Bob the village baker knows Sam the shoemaker” could be stored as a subject-predicate-object triple (specifically, “subject=‘Bob the village baker’”, “predicate=‘knows’”, and “object=‘Sam the shoemaker’”), it could also be stored in a more compact, normalized form. Continuing this example, the phrase “Bob the village baker” could be stored as a numeric value ‘1’ (or other short identifier) that merely refers to an entry in a dictionary that includes a relationship between the numeric value ‘1’ and the subject phrase, “Bob the village baker”. A similar normalization technique and respective entry into the dictionary can be applied to the predicate as well as to the object, such as assigning the numeric value ‘3’ (or other short identifier) to the object phrase, “Sam the shoemaker”. An RDF triple such as, “Bob the village baker knows Sam the shoemaker” can be stored a “‘1’ knows ‘3’”, where the normalized values of ‘1’ and ‘3’ can be denormalized in order to reconstruct the original RDF triple, “Bob the village baker knows Sam the shoemaker”.
When RDF data comprising many occurrences of subject-predicate-object entries in tables are stored in normalized forms, the aggregate data storage requirements are typically much smaller.
In many cases, RDF data is stored in relational database tables. When applying the foregoing normalization techniques, the relational database tables comprising RDF triples are normalized for their respective subjects, predicates, and objects. The RDF data stored in relational database tables can thus be stored in a normalized form, (e.g., where the normalized values are stored in a first database table and the denormalized values are stored in a second database table). Queries can be performed over the data, where a join operation is performed between the first database table and the second database table with a join key. The result of the join operation can be used by a database processing system to generate denormalized query results.
Unfortunately, the computing costs involved in performing join operations are often very high, especially if there are a large number of entries involved in the tables to be joined. Software applications involving semantic queries (e.g., involving RDF data and/or SPARQL queries) often need denormalized results, thus incurring the aforementioned high costs involved in performing joins over the multiple tables to obtain denormalized results.
Therefore, what is needed is a technique or techniques to improve over legacy techniques and/or over other considered approaches to reduce the computational expense of returning denormalized results of a query pertaining to RDF data triples that are stored in a normalized form within a relational database system. Some of the approaches described in this background section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.