1. Field of the Invention
This invention relates in general to controlling data structure integrity and, more particularly, to calculating checksums for data structures.
2. Background of the Invention
Whenever information is stored and/or transmitted electronically, it is of great importance to control the integrity of the data structure that contains the information. Both during storage and transmission, the data structure is subject to potential influence of errors. The physical devices which store or transmit the information are subject to physical and mechanical failure which in turn can lead to the introduction of errors in the data structure, or potentially a total loss of the information in the data structure.
A particularly important area where the control of data integrity requires powerful means and methods in order to be a critical safeguard against errors is the massive data backups used by financial enterprises such as credit card companies, banks, etc. The vast number of transactions in a day necessitates frequent updating of large data structures, and corresponding data integrity control.
It is possible to control the integrity of the data structures by always keeping a duplicate of the information in the data structure stored in another device. The integrity of the data structure can then always be controlled by comparing the data structure to the ones stored in the duplicate device. Obviously, a great disadvantage of this method is the increased storage capacity that must be accessible instantly upon changing the data structure, and the additional time and computer capacity needed.
Checksums have become a common tool used in controlling the integrity of data structures. Checksums are used in a variety of different areas within information technology. Typically, checksums are used in the following way. An application which handles information in the form of data structures calculates a checksum for a particular data structure. The checksum is calculated using the data stored in the data structure, and thus becomes a mathematical "fingerprint" of the particular data in the data structure. Whenever the application needs to control and confirm the integrity of the data structure, the application calculates the checksum again, using the same method of calculating as was used previously, and then compares the newly calculated checksum with the original checksum which was stored in the data structure. If the integrity of the data structure is intact, the checksums will be identical. Any discrepancy between the two checksums is a clear indication that a data integrity problem has occurred in the data structure. It is noted that checksum calculation is typically "transparent" to users of the data structures, which means that the user is normally not aware of checksum calculations--they are performed without particular inputs or outputs via the user interfaces--unless a data integrity problem is detected, at which the user should be notified or alerted.
The checksum for a data structure must always be updated when modifications of the data in the data structure have taken place. Previously, the updating of the checksum has been carried out in the following way. Before any changes are made in an existing data structure, the application will control and confirm the integrity of the data structure by calculating a checksum and comparing it to a stored checksum, as described above. If the comparison indicates that the integrity is intact, the changes in the data structure can be made. When the data structure has been modified as desired, the application must update the checksum to reflect the data structure including the changed data. The application calculates a new checksum for the modified data structure and stores that checksum in the data structure.
If the modified data structure contains substantially as much information as the original data structure, the second calculation takes approximately as much time to perform as did the first calculation. This approximation is valid regardless of whether a small or large portion of the data were altered during the change and also regardless of whether the data structure includes only a few bytes or several gigabytes.
It can be seen that there is a need for a method and apparatus for updating checksums of data structures that does not recalculate checksum contributions from parts of the data structure that were not changed.