Data management operations often have to extract the data from one or more data sources. The extracted data is frequently not in a form that is suitable for consumption or usage in a given data processing system. Therefore, the data management operations also have to often transform that data into a consumable form, such as for loading into a repository or a database. The process of extraction, transformation, and loading (ETL) of data in a given data processing system is compactly referred to as ETL. Accordingly, a data processing environment where ETL is employed is referred to as an ETL environment.
ETL operations are used in a variety of forms, in a variety of data processing environments, and for a variety of purposes. Database management is an example of a common circumstance in which ETL operations are used for extracting data from one or more data sources, including other databases; transforming that data according to some business requirements; and loading that transformed data into one or more destinations, including reports, workflows, a database, or other systems.
Typically, a business unit specifies a business requirement for which data has to be manipulated in a given ETL environment. A business requirement specifies a need or use of certain information in the context of a business operation. For example, a business requirement can be a text document describing a need, a use, a goal, or an objective (collectively hereinafter, “objective”) expected to be satisfied. The document may or may not specify any particular data for this purpose. Even when the data that is usable for this purpose is identified in the document, the identified data is insufficient to extract the data from a source, in a usable form, to satisfy the objective.
A team typically translates a business requirement into a technical specification. The technical specification describes one or more operations that have to be performed with some data, to achieve an objective of the business requirement. For example, the technical specification may specify a data source and a manner of extracting a particular data therefrom, a transformation that may have to be applied to the data, and a subsequent computation or storing operation with that data to produce a result that satisfies the objective of the business requirement.
Software developers implement the technical specification into code. The code, when executed on a data processing system, uses the data sources to obtain the data, performs the specified operations on the data, and produces the results.
The code is tested for satisfactory functioning. For example, a testing user creates one or more test cases for executing the code. The code, when executed according to a test case, produces test results. A user examines the test results to determine whether the code functioned as expected, and whether the results generated would satisfy the objective of the business requirement.