1. Field of the Invention
This invention is related to the field of data authentication and, more particularly, to the efficient computation of data signatures.
2. Description of the Related Art
In recent years, computer applications have become increasingly data intensive. Consequently, the demands placed upon networks due to the increasing amounts of data being transferred has increased dramatically. In order to better manage the needs of these data-centric networks, a variety of forms of computer networks have been developed. One such form of computer network is the “Storage Area Network” (SAN). Generally speaking, Storage Area Networks are configured to connect more than one storage device to one or more servers, using a high speed interconnect, such as Fibre Channel. Unlike a Local Area Network (LAN), the bulk of storage is moved off of the server and onto independent storage devices which are connected to the high speed network. Servers access these storage devices through this high speed network. One of the advantages of a SAN is the elimination of the bottleneck that may occur at a server which manages storage access for a number of clients. By allowing shared access to storage, a SAN may provide for lower data access latencies and improved performance.
While reduced latency in accessing data is important, ensuring the integrity and security of data is important as well. A variety of mechanisms exist which are designed to improve confidence in the integrity of data. One such mechanism involves generating a Message Digest (MD), or signature, for data. For example, MD5 is an algorithm that takes as input a message (data) of arbitrary length and produces as output a 128-bit “fingerprint”, or signature, of the input. When the data is later accessed, the signature is recomputed and compared to the previously computed signature. If the two signatures do not match, it may be assumed that the data has been corrupted in some way.
One of the desirable features of algorithms such as the MD5 algorithm is that the likelihood of producing two different messages with the same signature is computationally infeasible at present. For example, utilizing the well known statistical probability problem, the “birthday attack”, to find two messages with the same signature, 264 different messages would need to be tried. Assuming a given computer could attempt 1,000,000,000 different messages per second, identifying such a message may take nearly 600 years. Similarly, the difficulty of coming up with any message having a given signature would require on the order of 2128 operations. Consequently, the MD5 algorithm may be used to provide a relatively high degree of confidence in the authenticity of a given message.
In the context of computer file systems, signatures such as that described above may be used to ensure that data which is read or otherwise received has not been corrupted. For example, data files stored within a file system may have an associated signature which is generated at the time the file is stored. Subsequently, when the data file is read from storage, the signature may be recomputed and compared to the signature which was originally stored with the file. If the original and newly computed signatures are not identical, it may be assumed that the data has been corrupted. In addition, single instance storage systems may use signatures in order to identify identical files. In this manner, unnecessary duplication of files may be avoided.
While using sophisticated algorithms such as MD5 may be desirable in file systems, computing MD5 signatures requires a relatively large amount of processing and IO resources. Consequently, given the large amounts of data which move in and out of modern day storage systems, generating and checking MD5 signatures may significantly impact system performance. Therefore, a mechanism which is able to provide a high degree of data confidence in an efficient manner is desired.