Resource Description Framework (RDF) is a family of World Wide Web Consortium® (W3C®) specifications originally designed as a metadata model. Typically, RDF is described as a language for representing information about resources in the World Wide Web, but can be used in a more generalized manner for modeling information. The RDF metadata model is based on an idea of making statements about resources in the form of subject-predicate-object expressions (e.g., triplets or RDF triplets). Typically, a subject denotes a resource, and a predicate denotes traits or aspects of the resource and expresses a relationship between the subject and an object. For example, one way to represent the notion “The car has the color silver” in an RDF triplet is a subject denoting “the car”, a predicate denoting “has the color”, and an object denoting “silver”.
RDF statements made about online resources typically comprise a subject (e.g., a resource typically named by a Uniform Resource Identifier (URI)), a predicate (e.g., a resource representing a relationship), and an object is a resource or a Unicode string literal. Typically, a body of knowledge modeled by a collection of RDF statements can be subjected to reification where each RDF triplet can be assigned a URI and treated as a resource about which additional statements can be made. For example, the statement “MSN.com® says that Alan is the author of article X” illustrates this concept. As reification can be useful to determine trustworthiness or utility of a statement, one criticism of RDF is the ambiguous handling of the ability to reify such statements. A further of RDF criticism is that the triplet notation lacks the capacity to model more complex bodies of information.
Typically, once a collection of RDF metadata about resources has been stored (e.g., in one or more RDF graphs) this data is subjected to queries. RDF query languages can be used to write expressions that are evaluated against one or more RDF graphs in order to produce, for example, a narrowed set of statements, resources, or object values, or to perform comparisons and operations on such items. In addition, RDF queries can be used by knowledge management applications as a basis for inference actions.
Although several query languages for RDF graphs have emerged, typically, RDF graphs are queried using the emerging de facto standard, Simple Protocol and RDF Query Language (SPARQL), which is modeled loosely after Structured Query Language (SQL). While SPARQL can be used to express complex queries across diverse data sources (e.g., stored natively as RDF or viewed as RDF via middleware), it suffers from relatively narrow deployment in addition to forcing the user to learn a new query language. Moreover, as a relatively new query language it does not benefit from many years of optimization research surrounding other query languages (e.g., SQL). Such disadvantages can hinder the adoption of SPARQL and thus RDF itself.
As applied to collections of resources on the World Wide Web, it is apparent that the potential volume of information that could be stored in RDF graphs is virtually limitless, limited only by available storage capacity. In addition, the stored RDF metadata must be efficiently retrieved to be of any practical use. Thus two simultaneous objectives require designs for storing large volumes of RDF metadata while providing the ability to retrieve them quickly and efficiently (e.g., RDF Store).
Conventional implementations of RDF Store and SPARQL suffer from the basic limitation that results from using a database design in normalized form. As a result of this design, SPARQL queries are typically executed against in-memory RDF Data structures, rather than directly querying the backend. This can lead to out-of-memory errors, which are only exacerbated as the volume of queried RDF Data grows. This design also demands expensive high-performance hardware with relatively higher memory capability to handle the large volumes of RDF Data in real time (e.g., milliseconds response time). Thus, a fast storage and retrieval mechanism for RDF metadata is desired that can leverage conventional relational database management systems, techniques, and expertise, rather than the conventional method of using “triple stores,” that suffers from scalability issues. For example, even a simple query has been shown to take 1.5 seconds on a 200 million triple store. In addition, a lack of specified standards for converting SPARQL queries to SQL queries, prevents such implementations because SPARQL typically requires an RDF view or endpoint to query the underlying data. As a result, a reliable algorithm to achieve this conversion is desired that allows efficient access to an RDF store leveraging conventional relational database systems.
The above-described deficiencies are merely intended to provide an overview of some of the problems encountered in RDF store database design and access techniques, and are not intended to be exhaustive. Other problems with the state of the art may become further apparent upon review of the description of the various non-limiting embodiments of the invention that follows.