Business intelligence (BI) refers to the process of using available business data to gain a better understanding of business operations. Often, BI systems facilitate gathering and analyzing the business data to determine trends and to optimize related business practices. One example includes tracking and analyzing sales revenue related to various products, services, consumer groups, geographic locations, and so forth, in order to determine product development, marketing, and sales strategies. Another example includes tracking and analyzing costs associated with specific divisions or departments within a company in order to improve productivity and efficiency while controlling related expenses.
The business data may be stored in a centralized location that is controlled by a database management system (DBMS). In order to analyze available business data, BI queries are formed and executed on the business data. Each BI query performs functions to gather specific data from the available business data and perform an analytical operation on the gathered data.
In general, BI queries are complex. Each BI query usually has a large number of aggregates to compute. An aggregate is the result of using mathematical operations to combine the selected business data. More specifically, an aggregation function generates an aggregate value (i.e., the aggregate) from a collection of input values (i.e., the selected business data). Some examples of typical aggregate functions include SUM, COUNT, COUNT BIG, AVG, STDDEV, VARIANCE, COVARIANCE, and so forth.
Conventionally, a data structure is used for storing running aggregates that are calculated. A similar data structure can be used in both hash-based and sorted-based grouping and aggregation. This data structure is generally referred to as the aggregation working area. The aggregation working area is typically cache-resident in order to achieve good performance during the grouping and aggregation phase.
However, typical implementations of the aggregation working area are not particularly suitable for implementation with multi-core processors (e.g., CPUs). With the advent of multi-core processors, the computational power on a single server is constantly growing. Such symmetric multiprocessing (SMP) is used broadly in BI platforms for multi-threaded processing. Usually, there is one thread running on each CPU core. Unfortunately, the total cache size has not kept up with the growth of the number of processing cores. Hence, the available cache is decreasing relative to each core. This trend results in more and more cache contention among threads. If the accumulated size of the aggregation working area used for grouping and aggregation by each thread exceeds the cache size, then the aggregation working area for each thread will thrash in and out of cache. This thrashing increases the input/output (I/O) demands and decreases the productivity of the BI platform.