A data warehouse stores data that is used for reporting and analysis. This data may be collected from various data sources and placed in the data warehouse. Manual, semi-automatic, and automatic mechanisms may be used to collect the data. For example, a script might execute periodically to obtain information from a data source to place in the data warehouse. As another example, an employee may periodically copy data from a company database to the data warehouse. A data warehouse may have storage elements that correspond to information that an organization cares about. For example, a data warehouse may have a table that stores employee information retrieved from data sources.
Moving data from a data source to a data warehouse is commonly referred to as the extract, transform, and load (“ETL”) process. During extraction, the data from one or more data sources is moved in its raw format to staging tables inside the data warehouse in the same raw format. Transformation then takes the raw data and performs the operations for transforming the data into a format utilized by the data warehouse. Once the data has been transformed, the data is loaded into the data warehouse, where end users can consume it.
The most complex operation in the ETL process is the transform process. Implementing the transform process is typically a very labor-intensive operation, because a user must manually author program code, such as a script, that transforms the data. Manually authoring the transform code is difficult and time-consuming for a number of reasons. First, the author must have an intimate knowledge of the data source and the data warehouse, so that the transform process can be tailored to fit within the constructs of these systems. Second, due to the complexity in transforming data from a data source to a data warehouse, there is a great likelihood that manually authored transform code might introduce errors into the transformation process. Due to these factors, and others, the complexity involved in authoring transform code in an ETL process presents a significant obstacle that precludes the widespread use of data warehouses.
It is with respect to these and other considerations that the disclosure made herein is presented.