Technical Field
The present invention generally relates to a field of translation of a SPARQL query to a SQL query. More specifically, to a manner of navigating through the one or more generated graphs for translation.
Description of Related Art
Resource Description Framework (RDF) is a family of World Wide Web Consortium® (W3C)® based on an idea of making statements about resources in a triples format. In RDF, the triples describe data as consisting of three parts: subject, predicate and object. The subject is referred to as resource and can be a person, place or thing. The predicate identifies a property of the subject. The predicate may be any specific aspect, attribute, or relation used to describe the resource (subject). The object gives a “value” to the predicate (property). The objects can be literals, or strings that identify or name the corresponding property (predicate).
Generally, the given subject may include multiple predicates, each predicate further indexing the object. Commonly, the object may itself be another subject, with its own objects and predicates. In some cases, the resource can be both the subject and the object, e.g., the object to all upstream resources and the subject to all downstream resources and properties.
RDF triples are also termed as RDF graphs. In the RDF graph, at least two of the subject and object may be modelled as nodes, connected or joined with one another through predicates. The predicates may be modelled as arrows or arcs. In an example, the RDF graph may describe a T-Shirt is white in shade as follows. In this example, the subject is the T-shirt, the predicate (property) is the shade, and the object is white.
Much like a relational database, information on RDF graphs/RDF triples are stored in a database called as triples store. The triples store contain the collection of graphs as a set of “subject-predicate-object” triples. Different triples store have different storage designs. In one example, the triples store may contain triples arranged in tables. The table may comprise a subject column, a predicate column and an object column. In the table, rows of the column describe an element/data/item of the triple. In another example, the triples for the given subject could be represented as a row, with the subject being the primary key and each possible predicate being the column and the object being the value in the cell. In yet another design example, the components of the triples may be indexed and stored to respective tables by following the “hashed with origin” approach.
Most of the existing commercial triples store are either native triples store or the triples stores realized using the existing relational databases. The existing relational databases generally have Structured Query Language (SQL) based engines. The SQL based engines can perform queries in the Structured Query Language (SQL). The SQL queries take into account an underlying storage schema of the existing database. However, the queries may be made in a schema less query language such as Simple Protocol and RDF Query Language (SPARQL). The SPARQL query may run on top of the existing databases supporting the SQL. In one implementation, the SPARQL query may be run on the triples store having a native SPARQL query engine. In another implementation, where the triples store is realized using the existing relational databases, the queries in the SPARQL language are translated into the corresponding SQL queries for execution in the relational databases.
Information about data contained in the RDF graphs is retrieved via the query. For processing the query, databases are navigated to retrieve the data (triples). Since the data is stored in columns in the tables, processing of queries, requires the data from across the tables to be joined. The columns are joined by a union operation across multiple tables. The query engine itself chooses a way to join the tables while executing the query. As a result of these joins between the columns and/or between tables, even the simple query consumes lot of time. Hence, retrieval of data from the triples store remains a deficiency in using the databases efficiently.
A reliable manner of processing the query is desired that allows an effective translation of the SPARQL query into the SQL query language.