Extract Transform Load (ETL) refers to a process in database usage, more specifically in data warehousing, performed by an ETL tool. The process includes extracting data from an outside source, transforming the data to fit operational needs, and loading the transformed data into an end target (e.g., database or database warehouse). Typically, ETL tools read data from source systems, transform the data, and store frequently used data in what is called a dataset. An ETL process typically consists of numerous ETL jobs which the ETL tool sequences together. Subsequent to the data being transformed, the dataset is indexed and loaded in the end target. The end target, typically a relational database, utilizes cluster indexing in organizing the data to reduce the magnitude of the index. The relational database often is capable of receiving structured query language (SQL) queries for data and satisfying the queries utilizing the clustered index to obtain the data in the indexed datasets.
Currently, when a dataset is being loaded into the end target, an ETL developer would have to manually add a sort to the ETL code to have it match the clustering index of the end target. Typically, parallel processing is utilized by the ETL tool to load the dataset into the end target, which can affect the consistency of the dataset, since all aspects of the process have to be synchronized.