The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
One of the most successful analytical tools is a spreadsheet. A user can enter business data, define formulas over it using two-dimensional array abstractions, construct simultaneous equations with recursive models, pivot data and compute aggregates for selected cells, and apply a rich set of business functions. They also provide a flexible user interface with graphs, reports, etc.
Unfortunately, analytical usefulness of the RDBMS (Relational Database Management System) has not measured up to that of spreadsheets or specialized MOLAP (Multidimensional OnLine Analytical Processing) tools. It is cumbersome and in most cases inefficient to perform array-like calculations in SQL—a fundamental problem resulting from lack of language constructs to treat relations as arrays and to define formulas over them and a lack of efficient random access methods for inter-row calculations.
Spreadsheets, on the other hand, have their own problems. They offer two dimensional “row-column” addressing, i.e. physical addressing using row and column offsets. Hence, it is hard to build a symbolic model where formulas reference actual data values. A significant scalability problem exists when either the data set is large (can one define a spreadsheet with terabytes of sales data?) or the number of formulas is significant (can one process tens of thousands of spreadsheet formulas in parallel?). In collaborative analysis with multiple spreadsheets, consolidation is difficult as it is nearly impossible to get a complete picture of the business by querying multiple spreadsheets each using its own layout and placement of data. There is no standard metadata or a unified abstraction inter-relating them akin to RDBMS dictionary tables and RDBMS relations.
To address gaps left by spreadsheets or by RDBMS, SQL has been extended to include language constructs, referred to herein as spreadsheet extensions, that allow relations to be treated as n-dimensional arrays and that allow formulas to be defined in terms of the n-dimensional arrays. Formulas are encapsulated in a new SQL clause, referred to as a spreadsheet clause, that supports partitioning of the data. The spreadsheet extensions enable RDBMs to provide many of the advantages inherent to both RDBMS and spreadsheets.
To execute SQL statements more efficiently, RDBMs have a component, called a query optimizer, that rewrites queries into forms that may be evaluated more efficiently and that develops execution plans that are optimized for efficient execution. (The term query as used herein refers to a database statement that conforms to database language, such as SQL, and includes database statements that specify operations to modify data.) Query optimizers focus on determining efficient join orders and choosing optimal access methods, but largely disregard optimization of complex numerical formulas written in spreadsheet extensions. Unfortunately, a query that contains formulas can contain many of them. Because query optimizers ignore formulas, a query that contains them is evaluated much less efficiently than it could be.
Based on the foregoing, there is clearly a need for techniques that optimize queries that contain spreadsheet extensions.