The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Extensible Markup Language (XML) is a markup language that allows tagging of document elements and provides for the definition, transmission, validation, and interpretation of data between applications and between organizations. The XML specification was developed by the W3C consortium and is located on the Internet at “http://www.w3.org/xml”, the entire contents of which is hereby incorporated by reference for all purposes as if fully set forth herein.
The XML Query Language (XQuery) is a query language that is designed for querying a broad spectrum of XML information resources, such as, for example, XML-enabled databases and XML documents. A draft specification for XQuery is described in “XQuery 1.0: An XML Query Language”, W3C Candidate Recommendation 3 Nov. 2005, located at “http://www.w3.org/TR/xquery/”, the entire contents of which is hereby incorporated by reference for all purposes as if fully set forth herein. The XQuery specification provides for built-in functions and also allows users to declare functions of their own. As referred to herein, a function is a set of code which is executed as a block and to which flow of execution can be passed from a calling entity, such as, for example, an expression or a query. Typically, a user-defined XQuery function is declared in a function declaration that is included in an XQuery module. A function declaration comprises the name of the function, the names and datatypes of the function input parameters, the datatype of the result, or return parameter, that is returned by the function, and a function body. The function body includes one or more expressions that define how the result of the function is computed based on the input parameters.
FIG. 1A is a block diagram that illustrates a function declaration of an example user-defined XQuery function. Function declaration 100 comprises function name 102, function input parameters 104 with their datatypes, datatype 106 of the return parameter of the function, and function body 108. In FIG. 1A, function name 102 is the Qualified Name (QName) “local:add”. Input parameters 104 are parameter “$x” (of datatype “item( )?”) and parameter “$y” (also of datatype “item( )?”). Datatype 106 of the return parameter of function “local:add” is the “item( )?” datatype. Function body 108 includes an expression that sums the values of the two input parameters (“($x+$y)”), and an expression (“return ( . . . )”) that returns the sum of the two input parameters as the function result.
In general, a calling entity executes a function through a function invocation (also referred to as a function call). A function invocation generally includes a function name and a list of zero or more arguments, where the zero or more arguments correspond to input parameters specified in the function declaration. A function argument in an invocation is a value, or one or more expressions that evaluate to a value, and is typically associated with a datatype. When the function invocation is executed, the function is evaluated based on the arguments and, if no run-time errors are encountered, a value having the datatype of the return parameter of the function is returned to the calling entity. For example, a user-defined XQuery function may be executed from a query that includes an invocation, or call, to the function. Typically, the query includes the QName of the function followed by a parenthesized list of arguments. Each argument in the function invocation is bound to an input parameter declared in the function declaration of the XQuery function. If an invocation argument is based on one or more expressions, the one or more expressions are evaluated before control is passed to the function. The body of the XQuery function is then evaluated and a result value is returned to the query. The result value is either an instance of the datatype of the XQuery function's return parameter or an error.
The XQuery specification allows datatype overloading for XQuery functions, that is, XQuery functions may be declared with input and return parameters associated with generic or specific datatypes. As referred to herein, a generic datatype is a datatype that includes as a subtype at least one specific datatype which is derived directly or indirectly from the generic datatype. If a function input parameter is associated with a generic datatype in a function declaration of an XQuery function, then an argument of any specific datatype that is a subtype of the generic datatype can be properly passed in an invocation of the function. Similarly, if a function return parameter is associated with a generic datatype in the function declaration, then a return value of any specific datatype that is a subtype of the generic datatype can be properly returned from the function. In this way, the XQuery specification allows users to declare polymorphic functions, which are generally more simple and convenient to write.
Because of the convenience of function polymorphism, users tend to declare user-defined XQuery functions with generic datatypes (such as, for example, “item( )?”, “item( )*”, and “node( )+”) for the input and return function parameters. The users then rely on the XQuery processor, which processes expressions or queries that call XQuery functions, to automatically determine the precise and more specific datatypes that are associated with function invocation arguments and function return values. However, evaluating XQuery functions that are declared with generic parameter datatypes is challenging and the techniques for processing invocations of such functions have several disadvantages.
One technique for processing invocations of XQuery functions declared with generic parameter datatypes is to resolve all datatypes at run-time. Since XQuery allows for dynamic (or run-time) type-checking, this technique defers the type-checking of the datatypes of the arguments passed in a function invocation to run-time, e.g. when the expression or query that calls the function is executed. (Type-checking generally refers to the process of resolving the datatypes of arguments or parameters in a given query or expression.) The disadvantage of this technique, however, is that it is very resource inefficient. For example, a computer system evaluating a query that includes an XQuery function declared with generic parameter datatypes may have expended a great deal of computing resources to materialize some or all of the XML data that is to be returned, only to discard all this data because of a run-time error that is generated because of a wrong datatype of an argument in a function invocation.
Another technique for processing invocations of XQuery functions declared with generic parameter datatypes is to resolve at least some datatypes at compile-time by performing static type-checking. (Static type-checking generally refers to type-checking that is performed at compile-time based on declared or specified datatypes.) However, because of the polymorphic nature of a user-defined XQuery function declared with generic parameter datatypes, it is not possible to know the exact datatypes of the arguments that will be passed in the invocations of the function. Thus, one disadvantage of the technique using static type-checking is that it does not provide for a good compile-time analysis of the underlying query or expression invoking the XQuery function since the static type-checking will raise a lot of datatype errors and warnings. Another disadvantage of this technique is that the generic datatype information indicated in the function declaration is not specific enough to provide for a good compile-time optimization of the query that includes the function invocation.
For example, consider the following example query, which invokes the “local:add” function depicted in FIG. 1A:
for $i in fn:collection(‘/public/pofolder/’) return ( local:add(xs:date($i/po/@podate),  xdt:dayTimeDuration(‘P3H’)),  local:add(xs:decimal($i/po/@ponum), 34) )In the above example query, the “local:add” function is invoked in a “for” loop. In each iteration of the loop, the “local:add” function is invoked twice: once with arguments having a “xs:date” and “xdt:dayTimeDuration” datatypes, and once with arguments having “xs:decimal” datatypes. Thus, in each iteration of the loop, the “local:add” function is invoked once to add three days to a particular date that is located in an XML document at path “/public/pofolder/po/podate/”, and once to add the number “34” to a particular number that is located in the XML document at path “/public/pofolder/po/ponum/”.
As depicted in FIG. 1A, the “local:add” function declares the input and return parameters as having the generic datatype “item( )?”. Thus, the “+” operator in the body of the function is polymorphic since it can be used for adding both numeric arguments and date and duration arguments when the function is invoked, as shown in the above example query. The binding of the “+” operator to its arguments is not known until the actual argument datatypes of the arguments in an invocation of the function are determined. Therefore, in order to evaluate an invocation of the “local:add” function, a query compiler has to compile big datatype switch clauses in the execution tree of the query. The switch clauses must account for every possible combination of all specific datatypes that are subtypes of the declared generic datatypes, and serve to dispatch a particular “+” operation in a particular function invocation based on the specific argument datatypes in that invocation. The compiling of big datatype switch clauses in the execution tree of the query, however, impedes execution performance especially if the function is invoked in a loop as shown in the above example query. Even if static type-checking is performed on the function during compile-time, the exact nature of the “+” operator still cannot be determined because the function parameters or return values are declared in the function declaration with generic datatypes.
Although the disadvantages of the known techniques for processing polymorphic functions are presented above with respect to XQuery, it is noted that these disadvantages are not unique to the XQuery language. Rather, these disadvantages are common to any declarative computer language that allows for dynamic type-checking.
Based on the foregoing, there is a clear need for techniques for effective static type-checking and optimization of functions that are declared in a computer language that allows dynamic type-checking.