Many businesses and other organizations employ some form of large scale data storage as part of their information infrastructure. Many organizations store hundreds of terabytes, and even multiple petabytes (each of which is 1,000 terabytes), of data to facilitate routine business and record keeping. Data is added to these archives in a periodic fashion. Every week/month/year, or at whatever period the administrators (also referred to herein as “users”) of the application deem appropriate, data is removed from a production system and put into the archive. However, this new data may not be in the same format as the data already in the archive. In this context, a production system is any computer system or program running on a computer system operable to produce data tables, transmit said data tables to an archive and receive data back from said archive, in one format or another.
As time elapses, the nature or requirements of the applications whose data is being archived changes. For example, new data may be collected and stored in new columns in an existing table, or the accuracy with which data is stored may change (in other words the “type” of the column changes). Alternatively data that is no longer needed is not collected and the column is dropped from the table. When any of these changes occur (or other structural changes are made to the database) there remains the question of what happens to the data that is already archived. Some alternatives are listed below:
1. The archived data can all be recovered, have the same changes applied to it, and be resaved with those changes in place. However, not only is this a lengthy and expensive process, but also if the original data was archived for compliance purposes then there are potential issues with this form of recovery, since some part of the original data may no longer be stored. For example, casting a database field to a type with less precision (for instance reducing the number of decimal places of a fixed point type) will cause irrevocable data loss to the older archived data.
2. Take no action and leave the archived data unchanged. However, this will present problems when the data in the archive is queried. If the data is stored with multiple different schemas, it will be necessary to use different queries in each schema and also provide some additional procedure by which the results from these different queries are combined. Additionally, the results of the different queries may be formatted differently and be at different precisions.
No currently known technology provides a mechanism whereby a change to the database schema does not require modification to older data held in the archive, and at the same time enables queries written against the latest (in other words, current) schema to run against any and/or all data in the archive.