An Extract, Transform, and Load (ETL) process (e.g., a data management process) is used for data warehousing that consolidates data from multiple data sources. The first step in the ETL process is extracting data from various external sources. Each of the sources may store its data in completely different format from the rest. Almost any data storage can be used as a source for the ETL process. Once the data has been extracted and converted in an expected format, the next step in the ETL process is transforming the data according to a set of business rules/functions. The data transformation may include various operations including filtering, sorting, aggregating, joining data, cleaning data, generating calculated data based on existing values, and validating data. The final step of the ETL process involves loading the transformed data into a destination target, which may be a database or a data warehouse.
Examples of the source data for the ETL process can include data from different departments and/or divisions of a company that needs to be integrated. For example, a company's management team may need complete, accurate information of customers, suppliers and transactions of the company to make sound business decisions. This information is often not maintained at a single place, but rather at different locations/sources throughout the company across multiple departments, divisions and applications. The ETL process can extract data from different data sources within the company, transform the data, and populate the data into a data warehouse, so that the management team can do reporting, query, analysis, performance management and take effective business decisions.
The ETL process has many applications, including but not limited to data migration and application integration for multiple dispersed data sources. For example, in data migration, various data sources may be involved, and data may be generated and consumed by software applications which in turn support business processes. The ETL process can assist data flows among the data sources in multiple directions.