Structured Query Language (SQL) is a popular computer language employed to create, modify, retrieve and manipulate data from relational database management systems. In general, the SQL language has evolved beyond its original scope to support object-relational database management systems. Another type of query language includes language integrated query (LINQ) which applies to a set of operating system framework extensions that encompass language-integrated query, set, and transform operations. For example, these framework extensions can extend C# and Visual Basic with native language syntax for queries and provide class libraries to take advantage of these capabilities. As can be appreciated, LINQ functionality can be employed to extend other languages in addition to C# and Visual Basic.
One feature of query languages is related to the concepts of aggregation and grouping. In many query languages, aggregation and grouping are often used together. For example, grouping products by category and then computing the most expensive product for each group. One example application of aggregation includes traditional numerical aggregation. This form of grouping produces a relatively simple result of the many possible inputs to each group. Some complexity here arises from the need for aggregate functions to compute the result. Some example aggregate functions are numeric functions such as min, max, count, average, sum, histogram, and so forth.
Another type of aggregate function relates to ranking such as a range function and still yet other types of functions may be considered structural such as construct document fragment functions, save tuple functions, and tuple stream functions, where a tuple is a data object that holds several objects and are similar to a mathematical tuple. For instance, a tuple is similar to a list whose values cannot be modified or considered immutable. Tuples are normally written as a sequence of items contained in matching parentheses. Items in a tuple are accessed using a numeric index. Tuples can be nested and can contain other compound objects, including lists, dictionaries, and other tuples.
Another example type of aggregation is referred to as structural aggregation to a single result document. This form of aggregation creates a document which represents content of a group. The structural form is useful if no more query processing will be performed upon the aggregated data, or if the data must be treated as a whole. Further query processing on this data requires use of an unnest operation or function. Another form of structural aggregation includes processing data into tuple streams. In this form of structural aggregation, the tuples being grouped remain as individual tuples. This facilitates further query processing on grouped data. For example, techniques of document_ID order processing can be used to process the related groups in a group_id order.
With the addition of grouping, the problem of how to extract the resulting aggregates, numeric or structural, from each group is considered. This area is a prime target for careful implementation, as ordering via group is relatively straight-forward at this stage, and inexpensive compared to performing the same ordering at a later processing stage. Another consideration is that structural aggregation can produce a lot of structure that can be pruned through further query processing—but on that group. To reduce the overhead of structural aggregation it is important to have the ability to filter tuples which are placed in that group. This can also reduce the cost of a structural grouping operator considerably, as it no longer has to store a full group—typically only the entries that will be utilized later. This can be thought of as a simple push of a predicate through the output of a grouping operator into its input.
Grouping and aggregation have historically been concepts that are conceptually not difficult to imagine for the programmer yet the resulting implementations can be difficult. In relational systems, grouping and aggregation generally go hand-in-hand and their implementations are interdependent upon each other. The difficulty often arises in computing the aggregates correctly, and needing to process a potentially large number of groups. In SQL for example, combining grouping and aggregation is generally a necessity since results should always be flat, hence each nested group resulting from an SQL “group by” instruction is reduced to a scalar data value. In query languages such as LINQ or eXtensible Stylesheet Language Transformations (XSLT) that do allow nested operations, aggregation is typically performed in conjunction with grouping. While it is logically convenient for programmers to think of grouping and aggregation as two separate steps, where a data collection is first partitioned into explicit groups, which are then aggregated into a value, this type of two-stage processing is inefficient in terms of memory employed to execute the operations and more importantly the number of processing steps required to perform the desired aggregations.