The present invention relates generally to methods and apparatus for processing I/O requests in computer storage systems and, more particularly, to methods and apparatus which improve the file access performance of a distributed storage system that provides single name space to clients.
Some of today's computer storage systems are distributed systems having a number of storage nodes. Each node has a processing unit to process requests from clients or other nodes, a communication unit to send information to and receive information from clients or other nodes, and a storage unit to store data or management information. The nodes communicate with each other via a network and work as a single system. A distributed storage system has several advantages. For example, it has good scalability by adding nodes. It also provides good parallelism of process by distributing workload to a plurality of nodes. One example of distributed storage systems is disclosed in U.S. Pat. No. 7,155,466.
When the distributed storage system provides a file system to clients, it often organizes its storage capacity in single name space, which means a client gets an identical view regardless of the node or network port to which it is connected. To achieve single name space, nodes send and receive files among themselves. For example, when a node (“receiver node”) receives a request to read a file from a client and the file is not stored in the node, it identifies another node (“owner node”) that stores the file, requests the owner node to send the file to the receiver node, and sends the file to the client. The capability of single name space allows clients to access files in any nodes even if the client communicates with only one node, and simplifies the configuration management of the clients.
From the view point of performance, particularly throughput, however, the inter-node communication to provide single name space capability produces additional overhead of the I/O process. If a file is stored in a node which receives an access request from a client, there is no overhead because it is immediately retrieved and sent to the client directly by the receiver. On the other hand, if the file is stored in another node, it must be read by the owner node, transferred to the receiver node, and then sent to the client. In this case, the file is transferred through four NICs (network interface cards): the owner node's NIC, the receiver node's NIC which receives the file, the receiver node's NIC which sends the file to the client, and the client's NIC. Each time the file is processed through an NIC, it goes through the communication protocol stack which causes memory copy, encapsulation of data to be sent, and reorganization of data received. Particularly if the size of a file transferred is large (e.g., gigabytes), the additional overhead makes the time to transfer it long compared with the case in which a client receives a file directly from its owner node. On the other hand, if the size is small (e.g., kilobytes), the overhead does not have much effect because the transfer time is short.