1. Field
The disclosed embodiments generally relate to data processing systems, and in particular to data processing in data repository systems or data warehouse applications that receive and process data sources that are periodically updated and reported on.
2. Brief Description of Related Developments
Many reports and other data processing outputs are the result of a series of programmatic transformations acting on data that comes from many sources. The sources of data can be updated or can change over time. The state of the data at a particular point in time is referred to as its “currency” and the data is “current” if it is the most recent state of that data. In order to document the content of data on which the outputs are based, it is necessary to document both the processing steps and the currency of the sources of the data that were used to produce the output. In addition, in many cases it is necessary to ensure that the output reflects the most data available from each of the ultimate sources of data, from the beginning of the transformation processes. In other cases, it can be advantageous to be able to access the data in the currency that existed at a previous point in order to produce additional outputs that reflect that previous state of the data. Currently, companies manually determine and document the ultimate sources of the data and rerun or execute all of the intermediate processing steps, to ensure that the output reflects the currency of those sources. Generally, the only available alternative to re-executing all of the processing steps is to manually determine the state of the data in each of the processing steps that precede the output and selectively run or re-run those that need to be run due to more recent or current data (i.e. since the last time the program ran).
These manual processes are error prone and time consuming and this can result in a large use of computer resources and unnecessary delays in the availability of the end reports. It would be advantageous to allow the customers to automatically document the exact data currency used to produce reports and other outputs without needing to maintain time consuming and error-prone manual records. Further, it would be advantageous to reduce the need to re-execute the data processing steps by using this currency information. It would also be advantageous to use this information to provide existing reports and outputs if the source data for them has not changed instead of re-executing the reporting program each time the report is requested. These features allow for much more productive and efficient use of data repositories at lower cost than traditional warehousing solutions. Improved efficiency, reduced program execution, and reduced manual recordkeeping allows for more timely information delivery and decision support.