Operational data may refer to data collected by organizations in the normal course of business. For example, in the case of an organization collecting commercial data, the operational data may include the identity of the purchaser; an invoice number; a credit card number, if used; and the purchaser's contact information. Operational data may be stored in a database, referred to herein as the “operational database,” managed by a database management system, such as IBM™ Information Management System (IMS) or IBM™ DataBase 2 (DB2). An operational database may refer to a database that contains up-to-date, modifiable data. A database management system may refer to software that controls the organization (structure of the data), storage, retrieval, security and integrity of the data in the database.
The data in the operational database is generally the organization's official record of the business event or object described by the data. The structure and meaning of the data may be defined by what is referred to as “meta-data.” Meta-data may be stored separately from the data and, in some cases, separately from the database management system. Meta-data for an operational database may or may not exist, may or may not be complete, and may or may not be accurate.
The user may have an application program that is needed to retrieve and provide meaning to the data stored in the operational database. In order for the application program to retrieve and provide meaning to the data stored in the operational database, the application program requires that the data conform to the structure definition (the organization of the data as determined by the database management system). The application program depends on the specific version of the operating system and the database management system in order to operate properly.
Data retention is the act of keeping data for a longer period of time. It may exceed the time it is needed in the operational database for business purposes. Data archiving, on the other hand, refers to moving data from the operational database to a different data store or “archive storage” for better management over longer retention periods.
In recent years, there have been various legislative actions of the United States government, United States' state governments, and several other governments across the world, that require organizations of operational data to retain operational data for many years (e.g., seventy years) as well as to be able to interpret the data correctly and testify as to its authenticity. Severe penalties may accrue to the organization if they cannot produce the data if needed during the retention period for any legitimate reason or if they cannot defend the data as being authentic.
In the past, organizations generally did not archive data stored in an operational database. Instead, organizations would maintain the data in the operational database until the retention requirements were exceeded. That is, once the data was no longer needed to be maintained, the data would be deleted from the operational database. However, now that retention requirements are longer, the amount of information that needs to be maintained is greater, and the importance of preserving the original content and meaning of the data is more important, data cannot be practically maintained in an operational database until the retention period ends.
As a result, software products have been developed to archive the data from the operational database into an archive storage. Typically, these software products store data in application format (same format of the application program as discussed above) without sufficient additional information to satisfy retrieval requests to retrieve data directly from the archive storage. For instance, in order to properly query the requested data, the application environment used to generate the archived data may be required in order to retrieve and provide meaning to the archived data. This also requires copying the data back to the application storage in order to use it. If the application environment is no longer available or no longer recognizes the old form of data, the data in archive storage becomes useless. Therefore, there is a need in the art to be able to store data in the archive in a format that can be read and interpreted correctly without resorting to the original application environment. The application environment includes the application programs, database management system, operating system, and hardware.
Further, these archiving software products typically delete the data in the active operational database that is moved from the active operational database to the archive storage. Often these operational databases are part of an application that must run 24 hours 7 days a week and will have periods of very high activity as measured by several thousand transactions per second. By deleting the data in the active operational database that is moved from the active operational database to the archive storage, the application may not be able to handle or service the transactions in a satisfactorily manner. Therefore, there is a need in the art for archiving software that can perform its tasks without stopping the online activity nor slowing it down to the point it cannot service transactions within the required timeframe. That is, there is a need in the art for achieving minimum disruption of the operating environment when moving data from the operational database to the archive storage.
Another problem with typical software products that have been developed to archive data from operational databases is that they do not properly handle data structure changes that frequently occur in the operational environment. The data stored in the operational database has meta-data associated with it. Meta-data may refer to data that describes the content of the data and the relationship the data has with one another. For example, suppose that a field in a business record stores the zip code. Meta-data associated with that field may describe the length of the field storing the zip code. Operational databases can only support a single version of the data structure at any specific time. When there is a change in the meta-data (referred to herein as a “meta-data break”), such as changing the length of the field, these archive software products would simply overwrite the stored data associated with the previous version of the meta-data with the data associated with the latest version of the meta-data. This may be referred to as “rolling the data forward.” For example, after 1985, the zip code was extended to contain an additional four digits. These software products may then append the zip codes stored prior to 1985 with four zeroes in order for the data to be commensurate with the latest version of the meta-data. Hence, by rolling the data forward, the data associated with the previous version of the meta-data is no longer maintained in its original form. Hence, if a user desired to retrieve the archived data over a period of time which included a change in the meta-data, then the user would not be able to retrieve the data in its original form prior to the change to the meta-data. Anytime archived data is modified, such as when meta-data changes, the archived data becomes unreliable and there is chance that information will be lost. Consequently, these software products cannot guarantee authenticity of the stored data as the data is not maintained in its original form. Authenticity of data stored in archive storage can further be eroded if the storage method permits updates by several people and does not provide sufficient recording of update activity. The archive system needs to protect from both authorized and unauthorized changes to data to ensure that years later the data will be identical to what it was at the time of archive. Therefore, there is a need in the art to be able to preserve the original form of data when a meta-data break occurs, and to prevent loss of data authenticity due to unauthorized changes.
Further, there is a need in the art to be able to access, via a query (e.g., SQL query), the archived data in its original form that has been archived over a period of time where data structure changes have occurred as discussed above. That is, there is a need in the art for providing standalone query access against archived data containing multiple variations of the data stored over different time periods.
As a result of the above, there is a need in the art for guaranteeing the authenticity of the data stored in the archive storage.