A data warehouse is a central repository for all or significant parts of the data that is collected by an enterprise's business systems. Data from various sources is selectively extracted and organized on the data warehouse database for use by analytical applications and user queries. Data storage for the data warehouse database is often implemented in standard storage technologies like storage area network (SAN).
As businesses attempt to deal with the massive explosion in data, the ability to make real-time decisions that involve enormous amounts of information is critical to remain competitive. Today's data warehousing solutions face challenges as they deal with the increasing volumes of data, number of users, and complexity of analysis. As a result, it is imperative that data warehouse solutions seamlessly scale to address these challenges.
Most of today's general-purpose relational database management systems are designed for Online Transaction Processing (“OLTP”) applications. OLTP transaction workloads require quick access and updates to a small set of records. This work is typically performed in a localized area on disk, with one or a small number of parallel units. However, On-Line Analytical Processing (“OLAP”) technology enables data warehouses to be used effectively for online analysis, providing rapid responses to iterative complex analytical queries. OLAP's multidimensional data model and data aggregation techniques organize and summarize large amounts of data so it can be evaluated quickly using online analysis and graphical tools. The answer to a query into historical data often leads to subsequent queries as the user of the system searches for answers or explores possibilities. Thus, OLAP applications generally spend a significant amount of time scanning a large set of data. Therefore, typical On-Line Analytical Processing (or OLAP) applications have requirements for high throughput, require often sequential access to the underlying storage system (for example, in the order of tens of gigabytes/sec or terabytes/hour) over extremely large datasets (for example, in the order of tens to hundreds of Terabytes). Further, solutions targeted to OLTP applications do not work well for OLAP applications.
Hence, there is a need in the industry for a method and apparatus for managing scanning of databases in data storage systems to increase the efficiency and performance of OLAP applications.