Enterprises are building significantly sizable data warehouses to enable analytics as a way to unlock the power of information and improve business performance. One method of building data warehouses is to utilize a data storage process, like an Extract, Transform, and Load (ETL) process. In ETL processing, data are typically extracted from a source format and transformed into a desired target format. The transformed data are then loaded into a database or data warehouse for analysis purposes. In practice, the extraction and transformation of the source data may involve complicated computations that are composed of multiple smaller computational steps. Depending on the amount of data to be processed, data storage processing can be time consuming and expensive. For example, in order to handle an increase in the amount of data to be processed, data processing enterprises are forced to purchase faster computers to satisfy their computational needs. A more cost efficient solution would be to optimize the way in which we process and store data.
Conventional ETL technologies, which process unstructured and semi-structured data such as XML, are inefficient in processing data. In particular, many ETL processes are unnecessarily repetitive. For example, it is not uncommon to see an ETL process that repeatedly extracts and transforms the same data in a document when loading a target data warehouse. These repetitive steps waste valuable computational resources and needlessly lengthen the overall ETL process. In the end, data storage processors are spending time and money on unneeded computations.