Computer databases systems commonly are used for the storage of information or data. Referring to FIGS. 1-3, which illustrate an example computer database system 10, the information typically is stored in a database storage device 12, for example, one or more storage devices that are included in, or coupled to, a database server 14. The database storage device can be, for example, a computer-readable medium, e.g., a RAM, ROM, EEPROM, flash memory, CDROM, DVD, optical disk, magnetic cassette, magnetic tape, magnetic disk drive, or any other medium that can be used to store information and which can be accessed by a computer.
Users 16 interface with the computer database system via client computers 18, e.g., individual workstations, dumb terminals, personal computers (including hand-held or laptop devices), and the like. In most cases, the client computers operate under a client operating system, for example, the WINDOWS operating system from Microsoft Corporation of Redmond, Wash. The database server generally operates independently of the client computers, and runs under a server operating system. In other example computer systems, instead of there being a client computer and a separate database server, there is only one computer system that includes the function of both the client computer and the database server.
The client computers are coupled to the database server via a network 20. The network 20 can be, for example, a local area network (“LAN”), a wide area network (“WAN”), an intranet, the Internet, or some other network medium that facilitates the transmission of information between the client computers 18 and the database server 14. Any number of protocols, e.g., hypertext transfer protocol (“HTTP”), can be used to communicate the information between the client computers and the database server.
Most computer database systems 10 include a database management system (“DBMS”) 22, which includes a set of programs that are used to control the interface between a client computer 18 and the database storage device 12. More specifically, the DBMS is configured to help the user 16 perform common tasks, for example, the creation of the database, the retrieval of information from the database, the adding of information to the database, and the replication of information in the database.
DBMSs 22 are designed for compatibility with relational databases 24, in which, database information is organized in a set of tables of data that are interrelated to one another. Each table includes a plurality of data records or tuples (rows), with each record including multiple data attributes (columns). A field is the intersection of a single column with a single row, and each field typically contains a scalar value. Also, a DMBS can be configured to be compatible with an object database, i.e., a database where the information is in the form of objects.
The DBMS 22 handles data requests from the users 16, and formats and sends the results, i.e., subsets of the information included in the database 24, back to the users' client computers 18. User requests can include database queries (“queries”), which are logical expressions that are input to the client computers by the users. Because the data in a relational database is distributed across multiple tables, a query usually draws data from more than one table. A user formulates a query using a query language, such as SQL (SQL stands for Structured Query Language). The user inputs the query using an application program 26, which is usually resident in the client computer. Examples of database systems, which include a query language, include SQL SERVER from Microsoft Corporation (SQL stands for Structured Query Language); DB2 from IBM of Armonk, N.Y.; SYBASE IQ from Sybase of Dublin, Calif.; and ORACLE 91R2 from Oracle of Redwood Shores, Calif.
Each query specifies the information to be retrieved from the database 24 and the manner in which the information is to be manipulated or evaluated to provide a useful result. If any of the information in the database satisfies the conditions of the user's query, that information will be retrieved from the database, and returned to the client computer 18.
Before a query can be executed by the DBMS 22, the query must be processed and translated into a format that is appropriate for execution by the DBMS. Initially, the query is received by the DBMS and validated, e.g., the DBMS validates that an authorized user 16 has submitted the query, and that the tables referenced in the query are valid tables that are included in the database 24. After the query validation step, the DBMS forwards the query to an engine 28 that is included in the database server 14 and conceptually included in the DBMS. The engine is the portion of the DBMS that directly interacts with the database.
The engine 28 includes a parser 30, which parses the query into a query tree. The query tree is a tree structure that represents the components of the query, e.g., the terms and operators of the query. After the parsing step, the query tree is forwarded to a compiler 32, which is also included in the engine and includes an optimizer 34 and a code generator 36. The optimizer determines an efficient method (“query execution plan”) for implementing the query tree and accessing the information from the database 24 in the allotted time, i.e., the optimizer performs query optimization.
The query execution plan is passed from the optimizer 34 to the code generator 36, which converts the query execution plan into a set of instructions suitable for satisfying the query. These instructions are passed to an execution unit 38, which also is included as part of the engine 28 in the database server 14. Operating under the control of these instructions, the execution unit generates calls to lower-level routines for retrieving relevant information from the database tables. After the instructions have been executed by the execution unit, the database server returns the results of the query back to the client computer 18, via the network 20. Or, in the case where there is only one computer, i.e., the client computer and the database server are one device, the results need not be transferred via the network.
In addition to supporting traditional types of information, many relational databases 24 also support extensible markup language (“XML”) data. XML is a widely accepted markup language used to exchange data between dissimilar systems. The popularity of XML is due in large part to its flexibility for representing many kinds of information. SQL SERVER allows database information to be defined using an XML data type. This XML data can be searched, retrieved, and updated using SQL SERVER, as is done with traditional types of relational database information. The XML data included in the relational database can be large, can take a large amount of time to retrieve, and can require large amounts of storage space.
SQL SERVER provides functions that can be performed on the database tables. One type of function provided by SQL SERVER is the aggregate function. An aggregate function, or aggregator, is a function that applies to sets of rows (a “row set”), in a table rather than to a single row. More specifically, an aggregate function summarizes some characteristic of a row set. Thus, an aggregate function takes information from multiple rows of a database table, processes that information, and delivers an answer. The resulting answer is an aggregation of the information from the row set in a database table. Example aggregate functions provided by SQL SERVER include the COUNT, SUM, MAX, MIN, and AVG functions, which operate on a set of input scalar values and generate an aggregate value.
There are many scenarios in database system processing when there is a need to aggregate data from a row set into a complex scalar, referred to as “complex aggregation.” Complex aggregation, performed as a function of a complex aggregator, assembles all pieces from its input into one output unit. The output unit has the characteristic that the individual pieces are not lost, i.e., the pieces are still conceivable in the output unit. The following is a list of several examples where there is a need to aggregate data from a row set into a complex scalar in SQL SERVER: 1) when formatting relational data as XML data (e.g., SELECT Contacts.* FROM Contacts FOR XML RAW, TYPE); 2) when serializing XQuery results (XQuery stands for XML query language and is a standardized way of searching through XML data) into an XML scalar by aggregating a row set representing XQuery sequence elements (e.g., SELECT xml_col.query (‘for $a in /a return <NewA>{$a}</NewA>’) FROM t); and 3) when constructing an array/collection/multi-set of complex-type data using a constructing aggregate function (e.g., SELECT construct_phone_number_list(construct_phone_number(t.number,t.type)) FROM t). The first two of the above example statements result in a row containing a scalar XML value, i.e., a single, unnamed column and a single row. The last example statement results in complex-type data containing a sequence of complex scalars of type phone_number and containing the number and the phone type.
More than one aggregator, including, for example, XML aggregators (i.e., aggregate functions that involve XML data), can be nested within a single query, thus, resulting in more than one aggregator being distributed throughout the query tree. Having aggregators, for example, XML aggregators, distributed throughout the query tree can result in degradation in performance and scalability of the query compared to cases where aggregation can be done in a streamable manner without nesting aggregators. This is because the execution of each XML aggregator can result in the materialization of potentially large intermediate results in memory or on disk. Also, because the XML aggregators are distributed throughout the query tree, the query tree is less efficient since the same XML data will be copied each time an aggregator is processed. Accordingly, there are many inefficiencies associated with query trees that include more than one aggregator.