“Extract-transform-load” data integration jobs are known. More specifically, ETL jobs that include Unload-Transform-Load (UTL) portions are known. The concept of UTL will be discussed in more detail, below, in the DETAILED DESCRIPTION section of this document. Roughly speaking, Extract-Transform-Load (ETL) refers to a process in database usage, and especially in data integration, that involves: (i) extracting data from outside sources; (ii) transforming the extracted data to fit operational requirements (for example, quality levels); and (iii) loading the transformed data into the end target, such as a database. ETL processes generally entail considerable complexity, and problems sometimes occur due to improperly designed ETL systems.
Patterns for implementing complex transformations are well understood when creating ETL jobs (specifically, ETL jobs that include UTL portions). Many companies will have a significant investment in: (i) intellectual property in the form of ETL job-related code; and/or (ii) human resources who have training and experience in creating ETL jobs. The patterns for implementing complex transformations for a change-based job (CTL) are very different and are much less well understood. Few data integration job developers are experienced in this area. This makes it more difficult for businesses to move from a full load model (that is, ETL job with significant UTL portions) model to a change-based model (that is, CTL data integration job).
An ETL job (that includes UTL portions) traditionally relies on processing correlated tables (at least to the extent that the ETL job has UTL portions). A CTL job is not generally applied to correlated tables. This means that the standard strategies used in other automated transformations of ETL jobs (such as the Balanced Optimizer) cannot be used to convert an ETL job in equivalent CTL job(s).
The concept of “transformationally equivalent ETL jobs” will now be discussed. An ETL job defines a series of transformations taking data from a source database and moving it to a target database. The ETL job thus defines a specific transformational relationship between the respective data in the source and target databases. In many practical situations, there may be many ETL jobs which define the same transformational relationship. Conventionally, an optimal ETL job is defined, or re-used, for a given business situation. Determination of the optimal ETL job is conventionally based upon: (i) performance requirements; and/or (ii) other nonfunctional requirements. One of the ways that transformationally equivalent ETL jobs may differ, from each other, is in how much of the transformation processing is done in the target database and how much is done internally in the ETL tool itself. There is existing functionality (called the Balanced Optimizer) which can analyze an ETL job and provide a transformationally equivalent ETL job which has moved more of the transformation processing from the body of the ETL job to the target database.