In a traditional data warehousing environment, most of the data access applications are engineered to process a large volume of detailed data within a predefined “batch window,” during which applications take the maximum system resources for exporting, updating, and loading data warehouses. Completion of the data load marks the beginning of a “query cycle” where data is available for analytical work. Typically, data loads do not occur during the query cycle.
The focus of data warehouses is to provide complex, strategic decision support within an organization. While batch processing of detailed data is instrumental for data warehousing, such processing is usually confined by predefined scheduling criteria that are largely independent of individual, transactional events that can occur any time in a day. In today's competitive business environment where tactical decisions need to be made in a timely and factual manner, there is a strong growing need for an “active data warehouse” that provides high-performance data access, high data freshness, and 24×7×52 availability. In addition, user's intelligence about the logistics, sizes, and structure of data sources is also an integral part of this type of data warehouses.
In a traditional data warehousing environment, most of the data access components focus on fast loading and unloading of data with a high degree of parallelism and scalability. With the strong growing need of the “active” data warehouse where data is continuously loaded, updated, and queried, the burden on the data access components increases. In such an active environment, the data access components are required to be more intelligent in terms of what data is to be optimally accessed (or processed) at what time. This is due to the fact that there is a much higher demand for real-time tactical decision queries in the active data warehousing environment than for strategic decision queries in the traditional data warehouse environment. As a consequence, continuous availability and data freshness become almost absolute requirements, which demand a different approach to accessing data.
One of the major differences between an active data warehouse and a traditional data warehouse is that the former lacks the clearly identified load and query cycles, which means the data can be continuously loaded and continuously queried. Analytical applications, too, may be running continuously. Source data may appear at unpredictable times. Clearly, such a scenario presents some difficult issues to the data access components. Another difficult issue with data access is the emergence of concurrent access for short, tactical decision queries. One common way to maintain throughput for tactical queries is to place restraints on longer, analytical queries. However, those restraints by themselves might also demand another data access approach.