A computer system typically includes a computer file-system. A server computer system can receive data from at least one client computer system.
Need to Perform Data De-Duplication
Computer systems (e.g. server computer systems) need the ability to perform efficient data de-duplication on data. Because of the proliferation of large amounts of data in computer storage systems, the requirements of the computer hardware needed to store such amounts of data are often difficult to meet for many reasons. For example, the amount of physical space needed to house such computer hardware may be difficult to obtain. Also, the amount of energy used to power such computer hardware, as well as the cost of the hardware, may be difficult to meet. Thus, there is a need to reduce the amount of data stored on computer systems while maintaining consistent data stores in the computer systems, respectively.
Prior Art Systems
In file systems and large collections of files that have provisions for data de-duplication, the common method of identifying duplicate data is by cataloging a relationship between a piece of data (often a file) with its unique data signature (often a data signature of that data). As shown in prior art FIG. 1, a typical prior art system (1) catalogues a relationship between a piece of data with its unique data signature, (2) consults the catalog when data is added to the system, and (3) uses that information to possibly eliminate duplicate data pieces. It can also be consulted by clients at data transmission time in order to eliminate duplicate data transmissions to the collection of files.
Creating this catalog is very expensive, because a data signature must be calculate for each piece of data, and the file system must perform this calculating. Even though clients, in many cases, compute the data signature of data before they send it (for the possibility of obviating the send), the server cannot trust the new data signatures (not already in the server's catalog) that the client has produced for as-of-yet non-redundant data. The file system server cannot allow these client-produced data signatures to be placed in its permanent catalog because malicious clients could corrupt the catalog and cause other clients to corrupt their files by fooling the server into allowing other clients to use data which does not match its data signature.
The server is forced to verify the data signature after the client sends the non-redundant data. When large amounts of data are being sent to the server, such as during a backup, this can cause the server to become slow because it must re-calculate the data signatures of that data to verify the correctness of the client-produced data signatures.
Therefore, a method and system of data in a data store in a server computer system, is needed.