Static typing is a feature that may be employed by a query processor during the compilation of a query. Some query languages such as, for example, XQuery 1.0 and XPath 2.0 allow static typing to be performed. The World Wide Web Consortium (“W3C”) has provided formal semantics for these languages which describe the static typing for XQuery 1.0 and XPath 2.0 expressions (see http://www.w3.org/TR/xquery-semantics). Static typing enables a number of inferences to be made based on both type schema metadata and on the static semantics of the query itself. Put more simply, static typing may be used to infer an output expression type based on a set of known input expression types. For example, consider the input expression “$X+1,” which adds the integer “1” to the variable “X”. If it is known that the variable “X” is an integer, then it can be inferred that the type of the output expression is also an integer.
Static typing provides a number of advantages with respect to query execution. In particular, static typing enables early error detection and optimizations in query execution. For example, static typing during query compilation may enable type checks to be avoided at runtime, thereby making the execution process more efficient. While static typing is an optional feature for XQuery 1.0 and XPath 2.0, a static type inference can also be used in implementations that do not perform static typing. The static type inference can be used, for example, for the purpose of query optimization. In the XQuery 1.0 and XPath 2.0 Formal Semantics, the W3C describes a technique for performing the static type inference. The W3C technique involves separating the axis and node test stages of the inference and adding a simplification stage referred to as the “prime type and occurrence” simplification. The node test can in turn be either a node kind test or a name test.
While this W3C technique enables the correct static type to be inferred in many scenarios, its implementation also results in a number of drawbacks. One such drawback is that separating the axis and the node test stages of the inference increases the processing time required to perform the inference. This is because the separation of these stages requires a large quantity of schema information for an entire axis to be calculated during the axis stage and then subsequently filtered down to meet the node test criteria during the node test stage. For example, consider an “Employee” schema with a parent “Employee” node and child nodes “Name,” “Age,” “Sex,” “Eye Color,” “Hair Color,” and “Height.” Now suppose that a type inference is made for the expression “Customer/child::Age,” in which the axis is the child axis and the node test is “Age”. In this example, the W3C technique requires, during the axis filter stage, retrieving information for every one of the six child nodes on the child axis listed above. Then, during the node test filter stage, the retrieved type information is filtered down to only the node that matches the node test (e.g. the “Age” node). The W3C techniques will also require contracting a temporary type repository which can be quite large and also quite costly.
Another drawback of the W3C technique is that “prime type and occurrence” simplification may cause the static type inference to become less precise. This is because the simplification involves performing a prime factorization upon type information. The prime factorization, while simplifying type information, may also lose structural components of the information. In particular, prime factorization may cause information about a number of occurrences of nodes in a schema to be lost. For example, referring back to the “Employee” schema discussed above, it may be determined that the six child nodes in the “Employee” schema may each occur once in an arbitrary order. However, after a prime factorization is performed, it will still be known that each of the six child nodes is present in the schema, but it will no longer be known how many times each of them occurs. The information that each node occurs only once in the schema will be lost in the simplification. The loss of precision due to the prime type and occurrence simplification is damaging because, for example, it causes fewer expressions to be classified as type safe and it prohibits potentially better optimizations to occur. Accordingly, for these and other reason, there is a need in the art for improved techniques for performing a type inference.