Recently, grid computing to allocate processing by a computer distributively located on a network is developed. Furthermore, from a view point of request to cope with a disaster, a distributed storage system realizes virtual storage by distributively locating many storage nodes on a wide area network.
In such a system, a number of storage nodes composing the system frequently increases or reduces by an extension or an accident of the system. Accordingly, it is not realistic that a user individually assigns a file system or a file to the storage by hand-operation. Furthermore, in a system is which assignment of the file to a storage node by setting a server, occurrence of accident and concentration of load of the server affects the system. Accordingly, there is a need to automatically assign files to the storage node distributively without centralized control.
In order to solve this problem, various distributed storage systems are developed as follows.
CFS (Wide-area cooperative storage with CFS, Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica, 18th ACM Symposium on Operating Systems Principles (SOSP '01), October 2001)
CAN (A Scalable Content-Addressable Network, Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp and Scott Shenker, ACM SIGCOMM 2001)
In these systems, a node ID (identifier) is determined by applying a hash function to an address of the storage node. Briefly, the storage node is mapped (mapping) onto a space of file ID by using the node ID. A file is assigned to a storage node of which a node ID is nearest to a file ID of the file in the space.
In this method, which storage node to store the file can be determined by the storage node if the storage node has a list of node IDs of other storage nodes. Accordingly, a server centrally controlling assignment of the file is not necessary and an individual arrangement for each file assignment is not necessary between the storage nodes. In the case of adding or deleting a storage node, an address of the storage node is informed to other storage nodes only. As a result, a communication quantity between storage nodes reduces and a concurrency of processsing improves.
In the case of adding (participating) a new storage node, the new storage node allocates a part of a hash space. On the other hand, another storage node previously allocates an area including the part of the hash space (Hereinafter, another storage node is called a neighboring storage node). Accordingly, it is necessary to divide the area into two parts between the new storage node and the neighboring storage node. Conversely, in the case of deleting (removing) a storage node, an area allocated by the storage node is divided into two parts and two neighboring storage nodes on both sides of the storage node respectively allocated the two parts.
In order not to concentrate files on a predetermined storage node, file IDs should be uniformly distributed in a file ID space (The file ID space is the same as the hash space). Accordingly, a file ID is determined using a hash function. As an argument of the hash function, a file name or content data of the file is used. The file may be divided into a block unit and located as the block unit in the storage node as shown in “CFS”. Alternatively, all of the file may be located in the storage node as shown in “CAN”.
However, in assignment of the file to the storage node by the hash function, an expected value of an area assigned to each storage node is equal for all storage nodes. Accordingly, if a storage capacity, calculation ability or network speed is different for each storage node, problems occur. For example, even if a storage node of large capacity includes a remained capacity, another storage node of small capacity is short of capacity. As a result, the system as a whole cannot store the files. Furthermore, if I/O of a proper quantity for a storage node of high calculation ability is requested for another storage node of low calculation ability, the response speed from another storage node falls. In this way, in the case of putting the distributed storage system to practical use, these problems occur.
In the CFS, in order to avoid this problem, it is supposed that a storage node of large capacity is virtually corresponded to a plurality of virtual nodes. Certainly, if a large capacity of the storage node is several times as a small capacity of another storage node, the storage node of large capacity may be divided into several units. However, if a capacity is largely different between storage nodes, for example, if a large capacity of the storage node is several thousand times a small capacity of another storage node, the storage node of large capacity should be divided into several thousand units of virtual nodes. In this case, overhead to control each virtual node is a problem. Furthermore, if average capacity of storage node changes by improvement of disk technique, how a unit of the virtual node is adjusted is a problem. Accordingly, it is insufficient for the virtual node to cope with variety of capacity.
In this case, the distributed storage system in which all files are located on each storage node was explained. However, in a distributed storage system in which the file is located on the storage node by unit of block, the same problem occurs. Concretely, even if a pair of a file name and a block number is managed as a block name, the same problem occurs.