1. Field of the Invention
The present invention is related to optimizing aggregate processing.
2. Description of the Related Art
Relational DataBase Management System (RDBMS) software using a Structured Query Language (SQL) interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
The SQL standard introduced a set of new Extensible Markup Language (XML) publishing functions, including scalar functions XMLELEMENT, XMLFOREST, and XMLCONCAT, and an aggregate function, XMLAGG. These functions take SQL data as input and generate XML data as output.
An XMLELEMENT function creates an XML element. In particular, the XMLELEMENT function receives an identifier for use in naming the created XML element, an optional set of attribute name/value items, and an optional list of values for the content of this element. An XMLELEMENT function returns an instance of type XMLType.
An XMLFOREST function creates a forest of XML elements, which contains an element for each of the XMLFOREST arguments. The XMLFOREST function converts each of its argument parameters to XML, and then returns an XML fragment that is the concatenation of these converted arguments.
An XMLCONCAT function creates a forest of XML elements. The XMLCONCAT function takes as input a series of XML values, concatenates the series of values, and returns the concatenated series.
An XMLAGG function is an aggregate function that produces a forest of XML elements from a collection of XML elements. In particular, the XMLAGG function concatenates XML values from each row in a group into a single XML value. An optional ORDER BY clause may be specified within the XMLAGG function to request a particular order of the concatenation. An optional GROUP BY clause may be used in the SELECT statement to specify how to group rows.
Additionally, an XMLATTRIBUTES function defines one or more XML attributes for the XML element created by the XMLELEMENT function. Syntactically, XMLELEMENT and XMLATTRIBUTES are also referred to as “specifications.”
Due to the XML feature of element nesting for parent-child relationships and sequence concatenation, the XMLELEMENT, XMLFOREST, XMLCONCAT, and XMLAGG functions are commonly used in nesting and concatenation. Nested functions are ones in which one or more functions are included within another function. For example, SELECT statement (1) includes a set of nested functions, with the XMLATTRIBUTES function and the XMLFOREST functions nested in the XMLELEMENT function:
SELECT XMLAGG             (1) (XMLELEMENT (NAME “Emp”,  XMLATTRIBUTES (e.fname ∥ ‘ ’ ∥ e.lname AS “name”),  XMLFOREST (e.birthday, e.dept AS “department”) )  ORDER BY e.lname)FROM EMPLOYEE e;GROUP BY e.dept;
A traditional function evaluation technique for nested functions is to evaluate the functions inside-out. That is, the inner most functions are evaluated first and their results are used as input to the next outer level function, whose outputs are used as input to the next outer level function, etc.
The SELECT statement (1) has the following arguments: fname, lname, birthday, and dept. The XMLATTRIBUTES function has an argument, which is the concatenation of fname and lname. The XMLFOREST function generates a forest of two elements, one for each argument birthday and dept. The XMLAGG function aggregates rows, which are ordered by last name (e.lname) within groups, with each group corresponding to a department (e.dept).
Assuming that the following are input values for the arguments of SELECT statement (1): fname=‘Jack’, lname=‘Lee’, birthday=‘10-28-1960’, and dept=‘shipping’, the evaluation of SELECT statement (1) proceeds as follows. First, the XMLATRIBUTES function is evaluated and XMLATTRIBUTES(e.fname ∥″∥ e.lname AS “name”) evaluates to name=“Jack Lee”. Second, the XMLFOREST function is evaluated and XMLFOREST(e.birthday, e.dept AS “department”) evaluates to two elements: <birthday>1960-10-28</birthday><department>shipping</department>. The bracketed text (e.g., <birthday>) is a start tag of an element in XML, and the bracketed text with a slash (e.g., </birthday>) is an end tag of the element. Third, the XMLELEMENT function is evaluated and XMLELEMENT (NAME “Emp”, XMLATTRIBUTES (e.fname ∥″∥ e.lname AS “name”), XMLFOREST (e.birthday, e.dept AS “department”)) evaluates to:<Emp name=“Jack Lee”><birthday>1960-10-28</birthday><department>shipping </department></Emp>
In this process, the result of each function is usually copied to generate the next level result. For example, the results of the XMLATRIBUTES function and the XMLFOREST function are copied to generate the results of the XMLELEMENT function. The number of times data is copied is proportional to the levels of nesting. For example, since there are two levels of nesting, in SELECT statement (1), data is copied twice. Even with the simple example illustrated in SELECT statement (1), copying of data at each of the levels of nesting leads to inefficiency of the function evaluation due to data movement.
Moreover, since XML does not limit the number of levels of nesting, the number of levels of nesting for XML may be very large. Nesting levels of 7–14 are commonly seen. The large number of levels of nesting would require a great deal of copying of data, which is very inefficient when evaluating a function. In addition, if character large objects (CLOBs) are involved, the size of copied data is even larger.
Once the XMLELEMENT function is evaluated for each one of a set of rows, the XMLAGG function processes the results. The evaluation of the GROUP BY clause sorts data by one or more grouping columns, which in this case is employee department (e.dept) using a SORT operation. Additionally, rows in each group are sorted for the ORDER BY clause within the XMLAGG function.
There are two traditional alternatives to processing GROUP BY and ORDER BY clauses. One alternative is to append the ORDER BY key into GROUP BY columns for sort. One sort can then achieve both grouping and ordering. However, if there are two or more XMLAGG functions with ORDER BY clauses, multiple sorts are needed, and the intermediate results from sorting and grouping are merged.
The other alternative is to sort rows in each group separately for an ORDER BY clause within an XMLAGG function. Each SORT operation uses a workfile to store data for the sort process. Thus, the traditional approaches, which involve multiple SORT operations, also involve multiple workfiles for the multiple sorts. These workfiles use resources (e.g., memory) that are very expensive.
Thus there is a need in the art for improved aggregate processing.