A multi-node storage system is an example of a system for administering data. In such a multi-node storage system, data is distributed and stored in a plurality of computers connected via a network.
In a multi-node storage system, for example, a plurality of disk nodes and access nodes, a control node, and an administration node are connected to each other via a network. Each disk node is a computer having a storage unit. Each disk node administers the storage area of the connected storage unit by dividing the storage area into unit areas of fixed size. Each unit area serving as an administration unit of the disk node is referred to as a slice. Each slice is allocated to a logical volume defined as an access unit from the outside.
Each logical volume is constructed from one or more segments. To each segment, two slices belonging to mutually different disk nodes are allocated. Then, data is duplicated between the two slices allocated to the same segment. An access instruction from a user is issued with respect to a segment in a logical volume.
Each access node is a computer for accessing data in a disk node in response to a data access request from a user. For example, for the purpose of data access to a logical volume, an access node acquires, from the control node the configuration, information of logical volumes. Then, with reference to the configuration information of logical volumes, the access node distributes, to a suitable disk node, the data access from the user to the logical volume. Here, when the access to the disk node has failed, the access node acquires, from the control node, the configuration information that indicates the newest allocation relation of the accessing target segment in the logical volume. Then, on the basis of the newly acquired configuration information, the access node re-tries access to the disk node.
The administration node is a computer operated by a system administrator for administering the multi-node storage system. For example, when an administrator inputs a logical volume generation instruction, a corresponding command is transmitted to the control node.
The control node is a computer for administering allocation or the like of a slice to a logical volume. For example, the control node collects slice information from each disk node. The slice information collected from each disk node depicts the allocation relation between slices administered by the disk node and segments of logical volumes. Then, based on the collected slice information, the control node generates configuration information of logical volumes. Then, in response to a request from an access node, the control node notifies the configuration information of a logical volume to the access node (e.g., Japanese Laid-Open Patent Publication No. 2003-108420 and International Publication No. 2004/104845 pamphlet).
Meanwhile, in the multi-node storage system, the configuration of logical volumes varies depending on various situations. Such an example is fault occurrence in a disk node. Here, it is assumed that a fault has occurred in one disk node during the operation of the multi-node storage system. At that time, the data duplex state breaks down in a segment to which a slice belonging to the fault disk node has been allocated. Thus, a slice is re-allocated to the corresponding segment. This re-allocation is achieved by “copying to a newly allocated slice the data in the accessible slice belonging to the segment whose data duplex state has broken down”.
Such processing of restoring the data duplex state by newly allocating a slice to the segment whose data duplex state has broken down and then copying the data is referred to as recovery processing. The start of execution of such recovery processing is administered by the control node. When recovery processing is generated, the configuration information of logical volumes varies to a large extent.
Meanwhile, in the course of recovery processing and immediately after the completion of recovery processing, the configuration information of logical volumes held by the access node deviates temporarily from the actual situation. In this case, after the completion of recovery processing, when an access node actually accesses a disk node, the access fails so that the generating of recovery processing is recognized by the access node. Thus, for example, the access node having failed the access acquires, from the control node, information concerning the slice newly allocated to the accessing target segment and then updates the logical volume configuration information held by the access node itself.
Nevertheless, in a case that in the access node, update processing for the logical volume configuration information is performed after the access error occurrence, a problem arises that response to a data request from a user is delayed temporarily. That is, at the time of access, the access node awaits a response from the disk node until a time-out is detected. After that, the access node acquires, from the control node, the information concerning the newly allocated slice. Then, on the basis of the acquired information, the access is executed again. As a result, the response to the request from the user takes a longer time by the amount of the waiting time for the response, the acquisition time for the logical volume configuration information, and the time of re-accessing.
Such a problem arises not only in recovery processing but also in a case that the correspondence relation is changed between a virtual storage area of a logical volume and a physical storage area of a storage unit.