A distributed storage system offers enhanced performance and reliability through the coordinated operation of a plurality of storage nodes distributed over a network. This distributed storage system provides a virtual volume (or logical volume) to permit the user to make access to storage spaces of the storage nodes in a unified manner, regardless of their physical locations. For management purposes, the logical storage area of a logical volume is divided into a plurality of smaller areas, called “logical segments.” The logical volume is thus managed on a logical segment basis. Similarly, the storage space of a storage node is divided into a plurality of real data storage areas with the same size as the logical segments. Those storage areas are called “physical segments.” Each logical segment of a logical volume is associated with at least one physical segment in a storage node. When data mirroring is implemented, a pair of physical segments in different storage nodes are associated with a single logical segment. To manage individual physical segments, the storage system has a management data area to store the identifiers of associated physical and logical segments, together with other management information.
Most storage systems use a static configuration for their operation. For example, the association between logical segments and physical segments is defined at the time of system installation, and that configuration will be used without changes, as will be the segment size. In such a static configuration, a group of logical segments including those with a high access frequency may happen to be assigned to physical segments located in a few particular storage nodes. If that is the case, the concentrated access to such logical segments could overload the corresponding storage nodes, thus slowing down their response to file read or write operations.
To address the above-discussed problem of uneven access to storage resources in a storage system, there is proposed a method for controlling a computer network configured as a storage system. Specifically, the proposed method dynamically changes the arrangement of physical segments as necessary to level out differences in the usage of storage resources. The method improves the response of the storage system, besides increasing the availability of storage resources. See for example, Japanese Laid-open Patent Publication No. 9-223047, FIG. 11.
Another proposed method allocates storage space for distributed storage devices, such as a disk array system formed from many drives. The proposed allocation method manages the allocation of storage capacity of those drives in such a way that as many drives as possible can operate simultaneously in a sequential write access. See, for example, Japanese Laid-open Patent Publication No. 8-185275, FIG. 1.
As described above, various techniques have been proposed to achieve faster access to storage nodes or to increase the throughput of distributed storage systems. Conventional distributed storage systems, however, are unable to control the concentration of load on a few particular storage nodes, as will be discussed below by way of example.
Operating system (OS) is the fundamental software that acts as a host for applications. An application produces various I/O commands requesting access to a logical volume. Upon receipt of such I/O commands, the OS issues those commands to storage nodes after sorting them. FIG. 10 illustrates how the sorting of I/O commands at the OS affects the load imposed on the storage nodes in a conventional storage system.
A logical volume 910 is formed from four logical segments 911, 912, 913, and 914 arranged in ascending order of address. These logical segments are distributed across several storage nodes 920 to 940. The storage space of a storage node 920 is divided into three physical segments 921 to 923. Likewise, the storage space of another storage node 930 is divided into three physical segments 931 to 933. The storage space of yet another storage node 940 is divided into three physical segments 941 to 943.
In the example of FIG. 10, each segment bears a label composed of the letter “S” and a numeral. A logical segment shares the same label with its corresponding physical segment. For example, logical segment S1 911 is mapped to physical segment S1 921, meaning that the substantive data of logical segment S1 911 actually resides in physical segment S1 921.
Applications, when executed, make random access to the logical volume by issuing I/O commands 901 (hereafter “application-issued commands”). FIG. 10 depicts such I/O commands, together with an indication of their destination logical segments in parentheses. For example, “I/O command (S1)” represents an access request to logical segment S1 911 in the logical volume 910. The OS saves application-issued commands 901 in a temporary storage area, or buffer, when they are received from applications. The OS sorts those I/O commands according to their destination addresses before they are issued sequentially to the storage system. The resulting I/O commands are referred to as “OS-issued commands,” in contrast to the application-issued commands.
As a result of the sorting operation, the OS-issued commands 902 are now aligned in the order of destinations, ready to be sent out in that order. More specifically, the rearranged series of I/O commands begin with a group of I/O command (S1) addressed to logical segment S1 911. This is followed by a group of I/O commands (S2) addressed to logical segment S2 912, and then by a group of I/O commands (S3) addressed to logical segment S3 913. Take the group of I/O commands (S2), for example. This group consists of four I/O commands (S2) 903, which are issued consecutively to their destination physical segment 931 in the storage node 930. That is, four consecutive access requests concentrate in the physical segment 931, which may slow down the response from the corresponding storage node 930.
As mentioned earlier, performance improvement of storage systems has been desired, including faster response to an access request to physical volumes of storage nodes. Conventional approaches for achieving those goals are, however, all directed to segment-based management of storage volumes. As discussed above in FIG. 10, the conventional segment-based storage management is unable to solve the issue of longer response times and system slowdown due to the concentration of access requests to a small portion of a particular segment.
Reducing the segment size might alleviate the problem of access concentration to some extent. However, this approach also means an increased number of segments and, as a consequence, an increased amount of segment management data. Accordingly the storage system would need more time to spend for the management of such reduced-size segments.