Contemporary cloud computing and other centralized data storage scenarios (e.g., datacenters) need a great deal of storage. Contemporary storage systems based upon hard disk storage have hard disks attached directly to server nodes, typically with a single server set up as a file server. During normal operation, when a request for a file comes to this file server, the file server looks up the file system indexes and identifies the physical blocks on the hard drives that contain the file data, reads the individual blocks into memory, sets up a transfer from memory to the network interface, and completes the file transfer over the network interface.
One limitation of such a system is that the size of the file server storage is limited by the number of hard drives that can be attached to the physical machine. Large scale distributed systems overcome this limitation by setting up several such machines as peers and distributing the storage across the machines. Each machine then can have either a global view of the file index, or a local view along with peer communication to get a global view. In general, when a request for a file comes in to one of the servers, e.g., a main server machine, the main server machine identifies the peer machines across which the file blocks are distributed by looking up its local index or asking its peer machines, and requests the file blocks from each of the identified machines. Each peer machine then looks up the blocks among its local hard drives, reads the individual blocks into memory, and transfers the blocks over the network to the main server machine. The main server then assembles the file in memory from the various blocks received from the various peer machines and transfers them back over the network to complete the file transfer.
However, such a distributed system increases the size and cost of the storage by adding more compute nodes, with the additional computing power of these nodes often. This also leads to waste in the need for additional power and cooling in the datacenters. Further, there is additional bandwidth consumed in transferring the file from the machines with the data to the machine that is completing the file transfer. Moreover, with a main server handling reads and writes by assembling blocks, there is usually a limit on how many hard disks can be attached to a single server node.