1. Field of the Invention
The present invention relates to distributed storage systems. More particularly, the present invention relates to a system and a method for adaptively determining whether to process a network-RAID operation locally at a client node of a distributed storage system or centrally at a storage, or coordination, server of the system.
2. Description of the Related Art
It is often necessary in a distributed storage system to read or write data redundantly that has been striped on more than one storage server (or target). Such a system configuration is referred to as a “network-RAID” because the function of a RAID controller is performed by the network protocol of the distributed storage system by coordinating IO operations that are processed at multiple places concurrently in order to ensure correct system behavior, both atomically and serially. Distributed storage systems using a network-RAID protocol can process, or coordinate, a network-RAID-protocol IO request (IO request) locally at a client node or the request can be forwarded to a storage server or a coordination server for processing. For example, one client node may locally write data to a particular data location, while another client node may choose to forward a read or a write request for the same data location to a shared, or coordination, server.
FIG. 1 depicts an exemplary distributed storage system 100 in which a plurality of storage servers 101a–101c and a plurality of client nodes 102a–102c process read and write requests relating to redundant data using a network-RAID protocol. Storage servers 101a–101c are communicatively coupled to client nodes 102a–102c through a network 103. While only three storage servers 101a–101c and three client nodes 102a–102c are shown in FIG. 1, it should be understood that exemplary distributed storage system 100 can have any number of storage servers and client nodes.
Exemplary distributed storage systems are disclosed by, for example, K. Amiri et al., “Highly concurrent shared storage,” 20th Intl. Conf. on Distributed Computing Systems, April 2000; K. Amiri et al., “Dynamic function placement for data-intensive cluster computing,” In Proceedings Of the Usenix Ann. Technical Conference, June 2000; S. Frolund et al., “FAB: enterprise storage systems on a shoestring,” In Proceedings 9th Workshop on Hot Topics In Operating Systems, May 2003; E. Lee et al., “Petal: distributed virtual disks,” In Proceedings 7th International Conference on Architectural Support For Programming Languages and Operating Systems, 1996; and D. Long et al., “Swift/RAID: a distributed RAID system,” Computing System, 7(3), 1994.
Often the best choice of whether a network-RAID-protocol IO request should be processed locally at a client node or centrally by a storage, or a coordination, server varies on a request-by-request basis as network and system conditions vary and based on the type of IO request. Such a choice depends on several factors, such as the amount of contention in the workload of the client node when multiple clients are trying to read or write the same data, the performance of the client node, and the network capacity that connects the client node to storage.
A high level of contention in the workload of a client node can cause more than a 20% increase in response time to an IO request. In some cases, a high level of contention can cause a response time that is more than 200% greater than the response time for non-contention conditions. Thus, when the level of contention is high and/or when a client node is heavily loaded, it is often better for the client node to forward the request and a copy of the data associated with the request to a storage server having more resources and let the storage server coordinate the IO request. Similarly, when a client node has a low-bandwidth connection to storage while a storage server has a faster connection to storage, an IO request is best forwarded to the storage server, thereby minimizing the amount of data sent over the slow link of the client node. Further, during periods of high contention when multiple clients are trying to read or write the same data, it may be faster for a client node to forward all requests to a storage server rather than have client nodes contend with each other on a local basis.
Many conventional network-RAID protocols provide a choice of whether coordination of a network-RAID operation should be performed separately at a client node or centralized in a shared server. Having a client node coordinate IO requests in the common situation of a low level of contention and reasonably fast network connection, however, provides better performance than sending the IO request to a storage server or a coordination server because less work is performed. The data goes directly between the client node and the storage servers, such as depicted in FIG. 2A in which a client node 201 is depicted as coordinating a network-RAID operation with storage servers 202a and 202b. In contrast, FIG. 2B depicts a client node 210 as forwarding an IO request plus any data that is associated with the IO request to a coordination server 211. Coordination server 211 then coordinates the network-RAID operation with storage servers 212a and 212b. Additionally, by processing an IO request at a client node, the possibility is avoided that a shared storage server may become overloaded.
An exemplary distributed storage system using a network-RAID protocol that determines whether to process an IO request locally or centrally is disclosed by K. Amiri et al., “Dynamic function placement for data-intensive cluster computing,” Usenix Annual Technical Conference, June 2000. The Amiri et al. system makes periodic determinations regarding adaptively moving execution of IO processing steps from a client node to a storage server. After each determination, all subsequent IO operations are performed either locally or centrally based on the determination until the next periodic determination.
Nevertheless, what is needed is a way to adaptively determine on an operation-by-operation basis whether a network-RAID IO request is best processed locally at a client node of a distributed storage system or centrally at a coordination or at a storage server of the system.