1. Technical Field
The present invention relates in general to data processing and in particular to data compression. Still more particularly, the present invention relates to systems and methods for generating compressed affinity records from data records.
2. Description of the Related Art
Currently, data processing systems are widely utilized to process and store mission-critical data. A risk always exists, however, that such data may become inaccessible. For example, if data are stored exclusively as in a disk drive of a data processing system, and that disk drive experiences a hardware failure, the data may be lost or, at least, will not be accessible until the disk drive is repaired. In order to provide greater assurance that access to data will not be unduly interrupted, a technique known as mirroring is often utilized.
In a typical mirrored environment, each file in a primary data processing system that is to be mirrored is copied to a secondary data processing system, which is typically situated at a remote location relative to the primary (or on-site) data processing system. Then, as data in the duplicated files are modified in the primary data processing system, journal entries describing those modifications are transmitted to the secondary data processing system. Those journal entries are utilized by the secondary data processing system to keep the remote data consistent with the on-site data. The journal entries are typically composed of header and body portions, with the header containing transaction-specific information (such as the name of the file containing the data that was updated and the time that the update occurred) and the body containing a complete image of the modified data record, as stored in the system after the modification (i.e., a complete after-image of the record).
Conventional mirroring techniques thus provide a dependable secondary data repository. One of the main disadvantages associated with such mirroring techniques, however, is that they require substantial bandwidth when they are utilized to mirror files that contain large records or files that are frequently updated. The present invention recognizes that the bandwidth requirements would be substantially reduced if the journal records were compressed before they were transmitted. The present invention also recognizes that contexts other than mirroring involve data records that can be compressed based on record sequence to reduce storage and bandwidth requirements.
To address these shortcomings, the present invention introduces a method, system, and program product that obtains first and second data records, generates zero or more length values representing portions of the second data record that agree with corresponding portions of the first data record, generates zero or more delta values representing portions of the second data record that differ from corresponding portions of the first data record, and combines those length and delta values to form a compressed affinity record.
In an illustrative embodiment, the length and delta values are generated by exclusive ORing first and second journal records, and the compressed affinity record can be utilized to maintain mirrored data files.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.