Distributed storage systems include one or more computation nodes that read and write data to storage nodes. Typically, data is replicated, or mirrored, across two or more storage nodes to improve data transfer performance and provide redundancy. For example, a data file could be stored in three storage nodes located in three geographically diverse locations. A computation node accessing the data file would first attempt to locate the file in the storage node that is geographically closer to the computation node. Generally, access time is improved if the computation node and the storage node are nearer to each other geographically. If, however, the data file on this storage node is missing or corrupt, the computation node would next attempt to locate the data file one of the other two storage nodes that contain a copy of the data file.
A computation node that stores a new data file or modifies an existing data file sends point-to-point command frames, along with the new or modified data, to each storage node that is designated to store a copy of the data file. Each storage node that receives such a command frame stores or modifies the data file, as directed by the command frame. For example, if a data file is designated to be stored on three separate storage nodes, the computation node would send a separate command frame to each of the three storage nodes. Each storage node would receive the respective command frame, and then store or modify the data as directed by the command frame.
One drawback with this approach is that as the quantity of replicas of a data file increases, the performance burden on the computation node increases as a function of the quantity of storage nodes that contain a replica of the file. A possible solution to address this drawback is to add one or more computational nodes. The computational nodes then divide the task of generating command frames to create or modify replicas of the data file and sending the command frames to the storage nodes. However, adding computational nodes for the purpose of replicating data files is costly in terms of price, power requirements, and physical space.