Fault-tolerant systems using redundancy (hereinafter referred to as redundant systems) are utilized in enterprises today to enhance the reliability of data storage. The underlying concept of redundancy is to maintain multiple copies of data on one or more redundant servers. Therefore, if one redundant server, or the network used to access the redundant server, experiences a fault, the data is still accessible from one of the remaining redundant servers.
The maintenance of multiple copies of the data on the redundant servers requires synchronization of the state of the redundant servers. Therefore, changes made to the data on one redundant server must be reflected on all other redundant servers. This redundancy allows access to the changed data, irrespective of which redundant server experiences a fault. The redundant servers communicate over a synchronization path to exchange data for synchronization (hereinafter referred to as synchronization data). The synchronization path has two types: public path and private path.
The public path is a network that is publicly accessible and is used for a variety of applications apart from synchronization. An example of a public path is a local area network (LAN). On the other hand, the private path is always a dedicated network, used only for the purpose of synchronization. In certain redundant systems, a combination of public and private paths is used for synchronization.
The time required to perform synchronization (hereinafter referred to as synchronization time) is one of the key performance factors of the redundant system. However, a long synchronization time is often due to a slow synchronization path. The use of dedicated private paths reduces this synchronization time because dedicated private paths are usually faster than public paths, which are shared by many users. However, with increasingly larger volumes of data being stored on redundant servers, there is a need to further reduce synchronization time.
A reduction in the synchronization time can be achieved by compressing the synchronization data. Compression of the synchronization data leads to a reduction in the volume of data to be transferred over the slow synchronization path, which reduces synchronization time. Since the servers known in the art have a high computational capacity, the additional computational cost of the compression algorithms running on the redundant servers is acceptable in most redundant systems.
A number of data compression techniques are known in the art. For example, compression techniques known as dictionary based compression techniques are used to compress the data transmitted over data communication networks. Dictionary based compression techniques replace phrases with tokens. If the number of bits in the token is less than the number of bits in the phrase, compression occurs.
The compression ratio achieved by a data compression technique depends largely on the type of data being compressed. Synchronization data has inherent redundancies which are not considered by these known compression techniques. Therefore, the compression ratio achieved by these algorithms for synchronization data is not optimal.
The present invention overcomes these and/or other problems.