The present invention relates to data archiving, and more specifically, to archiving of data stored in databases. Data archiving refers to the process of moving data that is no longer actively used to a separate data storage device for long-term retention. Data archives typically consist of older data that is still important and necessary for future reference, as well as data that must be retained for regulatory compliance, audit trail purposes, or as a resource from which business insights can be derived based on the historical data. In general, data archives are indexed and have search capabilities so that files and parts of files can be easily located and retrieved.
Typically, database archiving works in three phases. First, records in a database, which are to be archive are fetched from the database and stored in a data archive. The fetching can typically be done using a SQL (Structured Query Language) query, such as a “SELECT” query. The retrieved records are archived in a suitable format onto some type of long-term storage media, such as a disk, tape, etc. Then, the archived records are verified to ensure their correctness, and finally the records are permanently deleted from the database once the verification is completed.
In conventional database archiving, two main approaches are used to delete archived records from the databases. The first approach is based on a complete data comparison model. That is, a one-to-one comparison between the archived data and the active (production) data in the database is made before the records are deleted from the database. The second approach is based on the idea of archiving database records based on partition data (when the archiving strategy is based on entire partitions of data) and then dropping the entire partition.
A drawback with these approaches is that the record comparison (with all attribute values) is expensive and computationally intensive, in part since the archived records must be fetched from the archive (i.e., the storage media/disk etc.) for comparison. In addition, no metadata is maintained when the records are updated after archival. Thus, there might be a possibility that someone updated a record, which has been archived and qualified for deletion. This causes the updated record in the database to be inconsistent with the archived copy of the same record, as no database related metadata is maintained. Thus, there is a need for improved techniques for archiving data.