1. Field
The present invention relates to decentralized data control for use in a multi-node storage system.
2. Description of the Related Art
In a computer system, it is required to manage a large volume of data used by a large number of users. A multi-node storage system is known as one of systems for managing such a large volume of data.
The multi-node storage system is a storage system that is constructed by using a plurality of nodes (computers). The multi-node storage system includes, for example, an access node, a control node, and a plurality of disk nodes. The access node provides environments to users that are adapted for access to data stored in the disk nodes. The users can access the data stored in the disk nodes by delivering data access requests to the access node. The control node performs maintenance and management of the data provided in the multi-node storage system.
The control node manages, for example, the correspondence relationship between a virtually defined logical disk and a storage area in a storage device which is managed by each of the disk nodes. For such management, the disk node includes the storage device and manages a physical area of the storage device per unit area, called a “slice”, partitioned into a certain size. Further, the control node divides the logical disk in units of segment and assigns a plurality of slices, which belong to different disk nodes, to each of the segments. Data can be decentralized by assigning slices, which belong to different disk nodes, for each of the plural segments constituting the logical disk.
The assigned slices are grouped into a primary slice for which the access node directly executes read/write, and a secondary slice for the purpose of mirroring the writing to the primary slice. In response to an access request for read/write including designation of a segment, the access node accesses the primary slice assigned to the designated segment. If the access request is a data write request, the disk node managing the primary slice transfers a copy of the written data to the secondary slice. As a result, data redundancy based on mirroring is ensured.
In the multi-node storage system described above, selection of the slice to be assigned to the segment of the logical disk is performed by the control node. Therefore, the control node is required to recognize the slice state in the storage device of each disk node. The slice state is mainly divided into a state in which the slice is assigned to the segment, and a state in which the slice is not assigned to the segment. The slice not assigned to the segment is called a “free slice”.
Information regarding the slice state is held as metadata in the disk node. The control node collects the metadata from each disk node and recognizes the slice states in the entire system. Further, when the control node assigns the slice to the segment and cancels the assignment of the slice, the updated slice is reported to the disk node that manages the relevant slice. Transmission and reception of metadata are frequently performed between the control node and the disk node in order to manage the slice state by the control node as described above.
However, the known multi-node storage system experiences the problem that as the system scale increases, the control node has to handle a larger volume of metadata and a longer time is taken to collect the metadata. Particularly, at startup of the multi-node storage system, the control node is required to collect the metadata from all the disk nodes. Therefore, a startup time of the multi-node storage system is prolonged with an increase in volume of the metadata.
The control node is often prepared in a clustered structure to improve reliability. In that case, if the control node under the operation fails, the function of the control node is taken over by another node within the cluster (called “failover”). The other node taking over the function of the previous control node collects the metadata from all the disk nodes to confirm the states of all the slices, and then starts the operation as the control node. Accordingly, if the volume of metadata is extremely increased, a time required for the failover is prolonged and a service stop period is also prolonged.
The technique disclosed herein has been conceived in view of the above-described state of the art and is intended to reduce an amount of transferred information regarding the slice assignment relation.