Multi-node storage systems are known as a class of data storage systems which employ a plurality of computers to store and manage data in a distributed manner. Specifically, a multi-node storage system is formed from a plurality of disk nodes and a control node which are interconnected by a network. The system provides virtual disk volumes, or logical volumes, for access to storage data physically distributed in multiple disk nodes under the control of the control node.
More specifically, each logical volume in a multi-node storage system is divided into segments. Disk node, on the other hand, have their local storage devices, the space of which is divided into fixed-length slices. Here the slice length is equal to the segment length. The control node assigns one slice to each individual segment of logical volumes and informs client computers, or access nodes, of the resulting associations between the slices and segments. An access node may send write data for a specific segment to a disk node managing the slice corresponding to that segment. The disk node then stores the received data in its storage device.
The above-described multi-node storage system is scalable in terms of data capacity. That is, the manageable capacity of the system can easily be expanded by adding new disk nodes to the network.
A computer system may have two or more copies of the same data in its storage devices. Such duplication of data degrades the efficiency of storage space usage. For example, regular data backup operations tend to produce data duplications, and most of a new backup volume is often identical with the previous one. The following literature proposes several techniques to reduce the redundancy of stored data when it is moved in the course of a backup operation or the like.    International Publication Pamphlet No. WO/2004/104845    Japanese Laid-open Patent Publication No. 2007-234026
Looking at smaller units of data in computer storage, a plurality of identical pieces of data may coexist even in a system in operation. Suppose, for example, that an e-mail message with a file attachment is sent to a plurality of recipients sharing a mail server. In this case, the mail server stores that same received e-mail data in different storage locations corresponding to the recipients.
Particularly in a multi-node storage system configured to serve different users with different logical volumes, it is possible to install the same application program in each of those logical volumes. As a result of the installation, the multi-node storage system as a whole stores the same code in multiple locations.
Conventional multi-node storage systems are, however, unable to reduce the redundancy of stored data in the case where identical data blocks are distributed in different disk nodes. Accordingly, the same data occupies a space in each such disk node, thus wasting storage resources.