Data warehouses typically store substantial quantities of information, and further strive to facilitate expeditious querying and efficient retrieval of such stored data. In general, a data warehouse is a nonvolatile repository that can store a substantial amount of data, such as historical and/or current information—wherein the historical data can correspond to past transactional or operational information. Accordingly, data warehousing and associated processing mechanisms (e.g., On-Line Analytical Processing (OLAP), Relational OLAP (ROLAP), Multidimensional OLAP (MOLAP), and Hybrid OLAP (HOLAP)) are widespread technologies employed to support business decisions and data analysis. Data warehouses are populated at regular intervals with data from one or more heterogeneous data sources, for example from multiple transactional or enterprise resource planning (ERP) systems. The aggregation of data can provide a consolidated view of an organization from which valuable information are derived.
Moreover, associated databases can organize and store data in a plurality of multidimensional tables, wherein the tables contain a series of rows (e.g., also referred to as records.) Each row provides particular information about a particular item such as a customer. Rows can further be divided into sections called columns. The intersection between a row and column is typically referred to as a field. Moreover, each field provides data or information related to a particular item. The columns specify the particular type of data provided in each field. For example, a table can be established for purchases of a particular product. Such table can include a plurality of rows corresponding to individual customers, and several columns for first name, last name, address, state, zip code, number of products purchased, price, date, and the like.
Database engines or management systems provide systems and mechanisms to retrieve and manipulate data from database tables upon specification of a query by a user. A query is typically expressed in a predetermined query language, such as Structured Query Language (SQL). In general, queries specify the data to be retrieved from a database. For example, a query can specify one or more tables as well as the rows to be retrieved, and any associated data operations or manipulations to be performed. Upon proper specification of a query, the database engine retrieves data and performs any specified operations, to produce a results table.
Typically, queries are processed in an operator tree, wherein data flows through such operator tree to calculate an answer for the query. Accordingly, queries against data warehouse and/or decision support systems commonly become complex, and hence require prolonged processing time for a vast amount of data. Moreover, the earlier that non-qualifying data are eliminated in such processes, the more efficient the complex queries can be executed.
In particular, many data warehouse applications are designed to follow a star or snowflake schema. In such schemas, a plurality of large fact tables (e.g., sales details) can be connected with multiple dimension tables (e.g., product information, sales region, and the like). Typical queries against such schemas consist of multiple joins of the fact table with several dimension tables. Moreover, the response time for these queries can become time consuming—e.g., measured in minutes or even hours, due to their intricate characteristics. Such prolonged execution time frames can diminish efficient utilization for the data warehouse applications, and result in an overall poor operation of the warehouse system.