1. Field of the Invention
The present invention relates to data integrity check methods and to hash functions.
2. Background Art
Typical storage media including, for example, storage tapes, do not have any security features built in. Data may be written on the media, data may be erased from the media, and data may be overwritten with other data. A first step toward increasing data security for storage media is providing a data integrity check method. Data integrity is the ability to prove that written data is intact, that is, has not been tampered with or modified by anyone.
In the security field, data integrity is often achieved with the use of a hash function. A hash function is a transformation that transforms an input to a fixed size string. Hash functions have a number of general uses. A cryptographic hash function is used in the security field to achieve data integrity. A cryptographic hash function is a one-way function that digests input data and has very few collisions. A one-way function is a function that is very difficult to invert. That is, data can be processed through the one-way hash function to get a result, but it is very difficult to reverse the function and obtain the data with the result. A cryptographic hash function digests input data in that the output is much smaller in size than the input data. For example, many pages of text may be digested by a cryptographic hash function to produce a 20 byte hash. In addition, a cryptographic hash function has very few collisions in that two different initial texts have very little chance of producing the same hash.
The capabilities of the cryptographic hash function are commonly used to provide data integrity. An existing data integrity check method using a cryptographic hash function involves the following. First, a data block or sequence of data blocks is received. The data is hashed using a cryptographic hash function or hash algorithm. The data and the hash are both stored (the hash is small compared to the data because the cryptographic hash function digests the data). To conduct the data integrity check, the data and the hash are retrieved from the storage medium. The data is then hashed using the hash function, and the obtained hash is compared with the stored hash that was retrieved from the storage medium. If both the originally stored hash and the recalculated hash are the same, then the data is considered authentic, that is, the data has not been modified. If the data had been replaced with some other data, then the hash of the other data that is calculated when the data is retrieved would not correspond to the original stored hash that was calculated when the data was stored. This existing process is useful in many applications because the process allows detection of modified data by comparing two hashes. However, although this process has been used in many successful applications, this existing process does have a limitation. The existing process cannot authenticate the data if the hash has been modified. The existing process cannot authenticate the data if the hash and data have both been modified and the new hash is different than the hash of the new data. And, the existing process cannot detect an error if both the data and the hash are replaced with new data and a hash of only the new data. That is, when the data and the hash are retrieved, computing the hash of the retrieved data would result in a computed hash matching the retrieved hash because the data and the hash were modified.
For the foregoing reasons, there is a need for a data integrity check method that can detect modifications to data even if the data and the associated hash are both replaced with new data and a hash of only the new data.
It is therefore an object of the present invention to provide a data integrity check method using a cumulative hash function that allows detection of data modification when a block of data and associated hash are both replaced.
In carrying out the above object, a method of writing a sequence of data blocks to a storage medium is provided. The method comprises receiving the sequence of data blocks, determining a sequence of hashes corresponding to the sequence of data blocks, and storing the sequence of data blocks and corresponding sequence of hashes on the storage medium. Each hash in the sequence of hashes corresponds to a data block in the sequence of data blocks. A particular hash corresponding to a particular data block is determined as a function of the particular data block and at least one previous hash corresponding to a previous data block in the sequence of data blocks.
In one embodiment, a particular hash corresponding to a particular data block is determined as a function of the particular data block and an immediately previous hash corresponding to an immediately previous data block in the sequence of data blocks. In a preferred embodiment, a particular hash corresponding to a particular data block is determined according to:
H1=hash (D1); and
Hi=hash (Hi-1, Di), i=2, 3, 4, . . . ;
where:
Dn is the n-th data block in the sequence of data blocks, n=1, 2, 3, . . . ;
Hn is the n-th hash in the sequence of hashes, n=1, 2, 3, . . . ; and
hash ( ) is a hashing function.
It is appreciated that the present invention provides a cumulative hash function in that a particular hash corresponding to a particular data block is determined as a function of the particular data block and at least one previous hash corresponding to a previous data block in the sequence of data blocks. The at least one previous hash may be an immediately previous hash or any other previous hash, or a number of different previous hashes. In addition, one specific example of a way to determine the hash is given above.
Further, in carrying out the present invention, a method of reading a sequence of data blocks and a corresponding sequence of original hashes from a storage medium is provided. The method comprises receiving the sequence of data blocks and the corresponding sequence of original hashes. A sequence of recalculated hashes corresponding to the sequence of data blocks is determined. Each recalculated hash in the sequence of recalculated hashes corresponds to a data block in the sequence of data blocks. A particular recalculated hash corresponding to a particular data block is determined as a function of the particular data block and at least one previous recalculated hash corresponding to a previous data block in the sequence of data blocks. The method further comprises comparing the sequence of recalculated hashes and the sequence of original hashes to detect any errors in the sequence of data blocks.
Further, in carrying out the present invention, a data storage medium is provided. The data storage medium has a sequence of data blocks and a corresponding sequence of hashes stored on the medium. Each hash in the sequence of hashes corresponds to a data block in the sequence of data blocks. A particular hash corresponding to a particular data block is determined as a function of the particular data block and at least one previous hash corresponding to a previous data block in the sequence of data blocks.
Still further, in carrying out the present invention, a medium having instructions stored thereon is provided. The instructions are executable by a processor to process a sequence of data blocks and determine a corresponding sequence of hashes. Each hash in the sequence of hashes corresponds to a data block in the sequence of data blocks. A particular hash corresponding to a particular data block is determined as a function of the particular data block and at least one previous hash corresponding to a previous data block in the sequence of data blocks.
It is appreciated that in the various ways for carrying out the invention, the hash function may be implemented in a number of different ways. Some exemplary hash function embodiments are described above.
The advantages associated with the embodiments of the present invention are numerous. For example, embodiments of the present invention utilize a cumulative hash function in that a particular hash corresponding to a particular data block is determined as a function of the particular data block and at least one previous hash corresponding to a previous data block in the sequence of data blocks. Because the hash is cumulative, it is possible to detect when both a block of data and the corresponding hash have been replaced on the storage medium when the replacement hash and all following hashes are not determined using the specific cumulative hash function used during the original writing of the data. Further, if there is an accidental error, attempts to recover the lost data can be made and the cumulative hash can be used to verify the recovered data.