1. Field of the Invention
Embodiments of the present invention generally relate to data storage systems, and more particularly, to a method for selectively storing blocks of data on a server.
2. Description of the Related Art
Modern computer networks generally comprise a plurality of user computers connected to one another and to a computer server via a communications network. To provide redundancy and high availability of information and applications that are executed upon a computer server, multiple computer servers may be arranged in a cluster, i.e., forming a server cluster. Such server clusters are available under the trademark VERITAS CLUSTER SERVER from Veritas Software Corporation of Mountain View, Calif. In a server cluster, a plurality of servers communicate with one another to facilitate failover redundancy such that when software or hardware, i.e., computer resources, become inoperative on one server, another server can quickly execute the same software that was running on the inoperative server substantially without interruption. As such, a user of services that are supported by a server cluster would not be substantially impacted by an inoperative server or software. To facilitate high availability and redundancy, the server cluster contains backup servers for redundantly storing data from the various servers within a server cluster. Backup servers are also employed in non-clustered environments to mitigate against the risk of hardware and software failure. Fast, efficient, low impact, cost effective backup of file and application servers is critical in many business environments.
In the interest of optimizing the amount of disk space utilized by backup servers, software applications have been developed that attempt to eliminate duplicate files as a backup of the files is being performed. Most of these applications calculate a signature, or identification number, for a given file that is to be backed up. This signature is then compared with other identification numbers, which are associated with files previously stored as a backup file, in an attempt to locate a match. If a duplicate is not located, the entire file is saved on the backup server and the corresponding signature is added to the signature database. If a duplicate signature is found, the file is then ignored and will not be stored on the backup server (since an exact copy already exists). By eliminating the identified extraneous data, these computer programs can reduce the amount of disk space used by the backup server to store files.
More specifically, a system for eliminating duplicate data on the block (i.e., sub-file) level can be used to reduce storage space even further than a file based system. This system operates by initially receiving a data block of predefined size and then subsequently calculating a signature for the block. The system then accesses a signature database on the backup server and initiates a search for a matching signature. This procedure is repeated for every block in the file to be stored. This is particularly effective when two copies of the same file are being archived. This system is not as effective when storing a copy of a file, and a slightly modified version of the copy. The reason is that an insertion anywhere inside the modified version will cause a misalignment of all of the signature blocks that follow the insertion. For example, a one Gigabyte file is stored on a server. A copy of the one Gigabyte file is made, and one byte is inserted at the beginning of the file. None of the blocks in the modified version of the file will match with the blocks stored on the server. The entire modified file must be archived, even though it is almost identical to the original. This would also be true if one byte were deleted from the front of the file.
Therefore, there is a need in the art for a more efficient method of eliminating redundant data in backup servers.