The present invention relates to computer systems, and more specifically, to column based data transfer in extract, transform and load (ETL) systems.
ETL systems facilitate extracting data from various sources, transforming the extracted data to fit operational requirements, and loading the transformed data into a target database (e.g., a data repository) at a target location. In many cases, the data that is accumulated is in a different format than what is ultimately needed in the data repository. The process of acquiring this data and converting it into useful, compatible, and accurate data is referred to as an ETL process. An example of an application that may be used to perform an ETL process is IBM® Information Server Suite.
In an ETL process, the extract phase acquires data from the source system(s). Data extraction can be as simple as copying a flat file from a database or as sophisticated as setting up interdependencies with remote systems that then supervise the transportation of source data to the target system. The extracted source data is typically stored as one or more relational database tables. The transform phase in the ETL process is typically made up of several stages and includes converting data formats and merging extracted source data to create data in a format suitable for the data repository. In addition, source data obtained in the extract phase from external systems may contain incompatible or incorrect information, depending on the checks and balances that were in effect on the external system, and thus, part of the transform phase includes cleansing and/or rejecting data. Common techniques used as part of the transform phase include character examination (e.g., to reject numeric values that contain characters) and range checking (e.g., to reject values outside of an acceptable range). Rejected records are usually deposited in a separate file and are then processed by a more sophisticated tool or processed manually to correct the problems. The values are then merged into the transformed set of data. The load phase of the ETL process includes depositing the transformed data into the new data store (e.g., the data repository, warehouse, mart, etc.). When the data repository is a relational database, the load process is often accomplished with structure query language (SQL) commands (e.g., IMPORT), utilities (e.g., LOAD), or other SQL tools.