Language optimization is a goal of every programmer. Efficient code runs quicker, consumes less computer resources and is more compact. These attributes are attractive to consumers of code that desire good performance of their computer hardware and software. The goal of code optimization is desirable of most applications, including query language applications where the code may be used to query over large databases such as relational databases. The problem becomes more acute when a system for query execution takes on the task of querying over multiple databases using multiple languages. In this scenario, efficient code generation benefits the query by returning query results both more quickly and more consistently than non-optimized code. One example of an optimization need is in a system where XML-related language queries, such as XSLT, XQuery and XPath or view definition languages, may be input into a system for the execution of queries over a SQL database. In such a system the role of an optimizer is to improve execution code efficiency.
XML queries pose at least four barriers to normalization and optimization; node identity, ordering, side-effects and construction. A common technique in database and functional programming optimization is to eliminate variables by performing substitution. However, great care must be exercised when performing a substitution as even a simple substitution may not work well.
One problem is that many XML query languages explicitly or implicitly depend on node identity. Consider, for example, the XQuery:                let $a :=<foo/>        return $a is $aThis XQuery constructs a single XML element, and then tests whether it has the same identity as itself. This query should return true. Contrast this with the query that would result from substitution (i.e., substituting every instance of $a with its value):        <foo/> is <foo/>This query constructs two elements, which are then different from each other so the query returns false. The problem may become even more pronounced with operators that implicitly sort by document order or eliminate duplicate nodes by identity.        
Another complication in normalization and optimization is that XML is ordered. It is desirable that this ordering be stable across document instances. Consider the following example of a code-motion technique. This example involves pushing expressions inside a loop or pulling them out of a loop when they are independent of the loop.
Given the query:
                for $i in $e1        where $condition1        return        for $j in $e2        where $condition2        return $kA typical rewrite may result in:        for $i in $e1        for $j in $e2        where $condition1        and $condition2        return $kThe effect of this rewrite is adverse upon the ordering and position of the results. Although this rewrite seems correct at first glance, the inner condition may refer to the position within that loop, which the rewrite has altered. Any optimization that would cause an expression to have a different order should only be applied when the position need not be preserved. Otherwise, erroneous rewrites occur.        
A common technique in programming language optimization is to eliminate temporary expressions when their results are not needed. However, elimination of even temporary XML query language expressions may not be completely side-effect free. Some expressions may terminate evaluation with an error, such as XQuery's error( ) function. Others may send a message to output, such as XSLT's <xsl:message/> operator. Some temporary expressions can be eliminated only if the query language semantics allow it. As two examples, consider the XQuery error( ) and false( ), and the XSLT path expression:                document(‘malformed.xml’)//foo[false( )]A strict implementation of these languages might require that both errors be reported, even though an optimizer would like to eliminate both expressions due to the always false( ) condition. Fortunately, XQuery allows the AND operator to short-circuit even when one of its operands may error. XSLT allows the document( ) function to return the empty list when attempting to process such an error. So, both expressions may be optimized at compile-time into the empty list. Consider an XQuery expression such as:        (<x dupe=“1” dupe=“2”>can you get here?</x>)//text( )In this example, an optimizer may want to eliminate the invalid temporary element and return only the text node. Notice also that if these queries are not optimized, they will produce errors at run-time, but when optimized they may not. In commercial implementations, interoperability with other programming languages may be paramount. However, calling a function written in another programming language may cause unknown effects, including possibly side-effects, which can be a barrier to optimization.        
Another complication is that faulty construction of XML may have undesirable side-effects. XML construction normally implies copying its contents and this alteration to node identity should be preserved through rewrites and execution. Consider the XQuery:    foo((<x><y/></x>)//y)This query invokes a function foo( ) by passing it the result of the expression (<x><y/></x>)//y, which is just the <y/> element. An optimizer would like to eliminate the apparently unnecessary XML construction and navigation and pass only the <y/> element that is selected. However, if foo( ) attempts to access the parent node and defines the function:    foo($y) {$y/..}and if the optimizer has eliminated <x>, then this query would produce the wrong results. It appears desirable that constructed XML being passed to external functions be preserved in its entirety to avoid construction problems. Similar problems occur for namespace declarations in scope, such as    (<x:x xmlns:x=“x”><y/></x>)//yand other meta-data instructions that may appear in a temporary XML expression. In languages like XSLT, construction side-effects may be of less concern because the language is not compositional, but such construction issues do significantly affect the optimization of languages like XQuery or when performing XML queries over constructed XML views.
Thus it would be advantageous for an XML optimizer to avoid false substitutions, to avoid rewrites that alter the order of rewritten expressions when order is important in an optimized XML expression, to avoid side-effects from temporary expression elimination and multiple language use and to avoid construction problems. The invention addresses the aforementioned needs and solves them with various systems, methods and techniques that also offer other advantages for optimizing queries.