The SPARQL Protocol And Query Language for RDF (SPARQL) is a World Wide Web Consortium (W3C) Recommendation for querying Resource Description Framework (RDF) datasets. Currently, a new revision of the specification, SPARQL 1.1, is being developed. SPARQL 1.1 is designed to add functionality in the form of aggregation functions, subqueries, negation, and property paths, but is fully compatible with the previous version, so that any query that is valid under SPARQL 1.0 is also valid under 1.1. The following describes briefly the SPARQL query language; more information can be found in the Recommendation produced by the W3C.
Let I be the set of IRI references, B the set of blank nodes, and L the set of literals.                An RDF graph G is a set of triples tε(I∪B)×I×(I∪B∪L).        The set of RDF terms is RDF-T=(I∪B∪L).        A canonical model of an RDF graph G is a model equivalent to G with blank nodes replaced with some IRI not appearing elsewhere in G nor in the context in which it is used, i.e., in queries. Every subsequent mention of RDF graphs in this application assumes a canonical model.        An RDF dataset D is a set {Go, (u1, G1), . . . , (un, Gn)}, where each Gi is an RDF graph and each ui is an IRI; the graph Go is called the default graph, and each pair (ui, Gi) is a named graph. One of the graphs in the dataset, called the active graph, is used for basic graph pattern matching.        A triple pattern is a tuple tPε(RDF-T∪V)×(I∪V)×(RDF-T∪V), where V is a set of variables disjoint from RDF-T.        A basic graph pattern is a set of triple patterns.        A value constraint is a Boolean-valued expression using elements from (RDF-T∪V), logical functions, equality and inequality symbols, and unary predicates.        Let P1, P2 be graph patterns, and r a value constraint; the SPARQL standard defines the following expressions:                    a basic graph pattern {P1. P2}            a graph pattern with filter {P1 FILTER r},            a group graph pattern {P1} {P2},            an alternative graph pattern {P1} UNION {P2}, and            an optional graph pattern {P1} OPTIONAL {P2}.                        These SPARQL expressions can be defined in turn by SPARQL algebra operators as shown in the following table:        
SPARQL syntactic expressionSPARQL algebra operatorP1.P2Join(P1,P2)P1 . FILTER rFilter(r,P1){PI}{P2}Join(P1,P2)P1 UNION P2Union(P1,P2)P1 OPTIONAL P2 . FILTER rLeftJoin(P1,P2,r)Diff(P1,P2,r)                SPARQL also defines a graph graph pattern, which does not lend itself to algebraic manipulation and which is therefore ignored in this document.        Note that the Join operator defines both the basic graph pattern and the group graph pattern, since functionally these two graph patterns are equivalent, and group graph patterns are essentially used to define scoping of variables and other operators. Note also that the Diff operator does not have a counterpart syntactic expression, since this operator is only used internally in the SPARQL algebra, as part of the definition of the LeftJoin operator.        It should also be observed that, in rigor, the SPARQL algebraic expressions operate over solution multisets rather than over graph patterns. We denote graph patterns and their solution multisets with the same symbols for simplicity, since in this context it is not necessary to make such distinction.        
Given two graph patterns P1 and P2, we say that P2 implies P1, and denote it P1←P2, if for every RDF graph G, every solution for P2 is also a solution for P1. P1 and P2 are said to be equivalent, denoted P1P2, if they imply each other. The following equivalences can then be derived from the SPARQL algebra:Commutativity of Join: Join(P1,P2)Join(P2,P1)  (1)Commutativity of Or: Or(P1,P2)Or(P2,P1)  (2)Associativity of Join: Join(P1,Join(P2,P3))Join(Join(P1,P2),P3)  (3)Associativity of Or: Or(P1,Or(P2,P3))Or(Or(P1,P2),P3)  (4)Distributivity of Join over Join(P1,Or(P2,P3))Or:Or(Join(P1,P2),Join(P1,P3))  (5)
Based on their associativity properties, then, we denote the sequences of binary Join and Or, Join( . . . Join(P1, P2), . . . , Pn) and Or( . . . Or(P1, P2), . . . , Pn), using the multiple-operator shorthands Join(P1, P2, . . . , Pn) and Or(P1, P2, . . . , Pn) respectively.
Algebraic manipulation of SPARQL queries is hampered by the non-distributivity of the disjunction Union operator over the LeftJoin operator. To address this issue, the semQA query algebra extension for SPARQL [1] proposes the definition of an idempotent disjunction operator Or, where Or is distributive over LeftJoin:
Definition 1.
                Let P1 and P2 be two graph patterns. The idempotent-disjunction graph pattern Or(P1,P2) results in all solutions for P1 or for P2 such that there does not exist a solution that is a subset of another solution, and such that each solution exists only once in the solution set. In this way, a solution for Or(P1, P2) is a solution for Union (P1, P2) that is not a subset of any other solution for Or(P1, P2). This definition makes Or an idempotent disjunction.        
With this Or operator, LeftJoin can be expressed in terms of Or and Join:LeftJoin conversion: LeftJoin(P1,P2,r)Or(P1,Filter(r,Join(P1,P2)))  (6)
This conversion of LeftJoin into a graph pattern consisting of Or and Join permits the elimination of any LeftJoin within the graph pattern, simplifying algebraic manipulation of queries. semQA then proposes mechanisms where any SPARQL graph pattern is converted to an i-d graph pattern by changing every Union operator for an Or operator, and replacing every LeftJoin operator by its equivalent Or-Join expression.
To drive optimization of queries, semQA also proposes filter pushdown methods against both Or and Join:Filter(r1&&r2,P1)Filter(r1,Filter(r2,P1))  (7)Filter(r1&&r2,P1)Join(Filter(r1,P1),Filter(r2,P1))  (8)Filter(r1∥r2,P1)Or(Filter(r1,P1),Filter(r2,P1))  (9)
The symbols && and ∥ denote respectively logical-and logical-or operations on value constraints. To derive additional algebraic equivalences that allow the pushdown of filters, semQA provides a set of definitions that allow the processing of different types of value constraints.
Definition 2.
                (a) An atomic value constraint is a value constraint that does not include logical- and or logical-or operators. (b) A conjunctive value constraint is a value constraint composed of one or more atomic value constraints linked by logical-and operators. (c) a value constraint in disjunctive normal form consists of one or more conjunctive value constraints linked by logical-or operators.Definition 3.        Let var(r) denote the set of variables in a value constraint r. Given a conjunctive value constraint r and a graph pattern P, a restriction of r on P, denoted R(r,P), is defined as follows:                    If r is atomic, R(r,P)=r if var(r)⊂var(P) and r is not of the form !bound(v), true otherwise.            If r=(r1 && r2 && . . . && rn), where every ri is a conjunctive value constraint,                            R(r,P)=false if for all ri, var(ri)⊂var(P) and ri is not of the form !bound(v); OT                Otherwise, R(r,P)=R(r1,P) && R(r2,P) && . . . && R(rn,P).Definition 4.                                                Given a conjunctive value constraint r and graph patterns P1 and P2, the overlap of r on P1 and P2, denoted L(r,P1,P2), is defined as follows:                    If r is atomic, L(r,P1,P2) is                            r if it is of the form !bound(v); or if var(r)⊂(var(P1)∪var(P2)), and var(r)⊂var(P1) and var(r)⊂var(P2).                false if var(r)⊂(var(P1)∪var(P2))                true if var(r)⊂var(P1) or var(r)⊂var(P2)                                    If r=(r1 && r2 && . . . && rn), where every ri is a conjunctive value constraint, L(r,P1,P2)=L(r1,P1,P2) && L(r2,P1,P2) && . . . && L(rn,P1,P2).                        
These definitions lead to the following equivalence, supposing that r is a conjunctive value constraint:Filter(r1,Join(P1,P2))Filter(L(r,P1,P2),Join(Filter(R(r,P1),P1),Filter(R(r,P2),P2)))  (10)
In addition, we have the following:Filter(r1,Or(P1,P2))→Join(Filter(r1,P1),Filter(r1,P2))  (11)
Note that the implication in (11) is unidirectional, since the reverse is not necessarily true. semQA also proposes mechanisms for the resolution of i-d graph patterns to create result sets conformant to the original SPARQL graph pattern. This is done by taking the results obtained from evaluation of an i-d graph pattern and constructing SPARQL results by processing them through the SPARQL query.
While this method is mathematically sound, it suffers from an important drawback. The expansion of LeftJoin into its Or-Join equivalent, and the subsequent conversion of the resulting semQA graph pattern into disjunctive normal form, increases exponentially the size of the query. This patent application concerns a set of mechanisms and an implementation designed to enable the algebraic processing of SPARQL queries into semQA2 graph patterns that do not need to eliminate the LeftJoin operator. For this, we make use of the algebraic principles already developed under semQA to improve query performance, we add capabilities for the algebraic manipulation and pushdown of filters with LeftJoin operators, and we define a mechanism to create an optimized SPARQL query plan using semQA2.