Organizations that store large amounts of data utilize database systems to manage that data. One type of database system is a data warehouse. A data warehouse is a collection of data that is structured to allow for analytical and reporting tasks. Such analytical tasks can provide decision makers with important information. The structure of data within a data warehouse is in contrast to the structure of data within operational databases which are structured to provide transactional operations to support day-to-day business operations such as sales, inventory control and accounting.
A data flow process such as an Extract, Transform, and Load (ETL) process is performed to transfer data that is formatted for operational tasks to data that is formatted for the analytical tasks associated with a data warehouse. This process involves extracting data from multiple sources. The data from these multiple sources may be formatted differently or contain details too low-level or not relevant and thus it will have to be transformed for data warehouse operations. Finally, the data is loaded into the data warehouse.
At each stage of the ETL process, various tasks are performed. For example, for the transformation stage, several tasks may be performed including filtering, sorting, joining, generating surrogate keys, and transposing. Different processing techniques may be used to perform these tasks. For example, some software applications are designed specifically for ETL processing. These applications may use certain processing techniques to perform ETL tasks. Additionally, the database management system for an operational database may use certain processing techniques for performing some of the ETL related tasks. Furthermore, a parallel processing technique may be performed on a distributed computing system. Executing the entire ETL processing flow using a single category of processing techniques may not be as efficient because some tasks within that ETL processing flow may be more efficient using different types of processing techniques.
Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.