Generally, online application development was not built with data warehousing concerns or processing in mind. This is because compatibility with data warehouses was not a design consideration with most existing applications, which have been developed. Additionally, because of the data format disparity between different data stores, compatibility was often intentionally avoided within an application in order to create more perceived application portability and to decouple the application from any particular data store.
As a result, the traditional practice of Extract Transfer and Load (ETL) from a database has become and still remains the predominate approach. With ETL, an application generally produces a file in storage having data store updates embedded therein. A subsequent application will then process the file, translate it into a specific data store format, and interact with a data store interface to perform the updates on the data store. It is generally believed that this approach will also minimize data store transactions and may be used to better manage data store performance, since processing can be scheduled at low usage times.
However, the ETL technique usually entails batch operations, meaning that the updates produced by an application and temporarily housed in an intermediate file are processed once the file is completely populated by the application and once scheduling to access the data store is deemed appropriate. This can be a substantial disadvantage in today's real-time economy and business environment. Additionally, since the data is updated from a file (e.g., storage) rather than from in-memory structures of an application, the update processing may be less than optimal when the data store is eventually updated.
In fact, current business conditions put a significant premium on fast responses to changes in business events. Businesses rely on updated business data and metrics. Thus, the conventional luxury of an ETL batch cycle to restrict and control access to a business's data warehouse no longer exists. A more “live” or real-time approach is desired and often needed because business data has rapidly become time sensitive and critical to businesses.
There has become an increasing pressure to provide “right-time” integration between online applications and data warehouses so that the data in the warehouse is current. Once the data is up-to-date in near real-time, analytical warehousing applications process the data with a series of business-defined services. Some of these services may send real-time business alerts to interested parties, stakeholders, or automated applications. If automated applications receive the alerts, then business decisions or actions based on the real-time updates can be made nearly instantaneously.
It is therefore desirable to have more “active data warehousing,” where a warehouse has access to data as it is created (e.g., live or real-time data). Thus, improved techniques for populating data stores are desired.