In many large IT environments, requirements exist for transporting data in and out of individual systems (e.g., data bridges) as a form of integration. Tools used to transport data generally fall into the category of ETL (extract, transform, load) tools. ETL is a process in data warehousing that involves extracting data from outside sources, transforming the data in accordance with particular business needs, and loading the data into a data warehouse. An ETL process typically begins with a user defining a data flow that defines data transformation activities that extract data from, e.g., flat files or relational tables, transform the data, and load the data into a data warehouse, data mart, or staging table(s). A data flow, therefore, typically includes a sequence of operations modeled as data flowing from various types of sources, through various transformations, and finally ending in one or more targets.
Prior art ETL approaches require the creation of multiple redundant processes, e.g., one for each table or data set. This is especially true when using GUI tools of ETL products. The GUI tools make it very easy to move data between systems. However, when there are complex requirements, such as the need to identify what has changed between the source and the target, the ETL tools, and even custom scripts, require a lot of modification. This results in the exponential growth of the code base (or process nodes).
Therefore, when developing a data bridge between two information systems, one of the biggest challenges is the handling of individual elements. Most ETL tools or batch frameworks provide powerful functions, yet a developer still has to code individually on each data object to perform common tasks such as data validation, record comparison, etc. The process is error prone due to typos, changes in requirements (e.g., go back and adjust every object), etc. Accordingly, what is needed is a solution that solves at least one of the above-identified deficiencies.