A critical component of computer systems is data storage. Data storage can be divided conceptually into an individual user's data storage, which is attached directly to the individual's computer, and network based data storage typically intended for multiple users.
One type of network based storage device is a disk array. The disk array includes a controller coupled to an array of disks. Typically, components (e.g., the controller and the disks) of the disk array are hot swappable, which allows components to be replaced without turning off the disk array.
As an alternative to the disk array, researchers have been exploring data storage within a distributed storage system which includes an array of independent storage devices coupled together by a network. Each of the independent storage devices includes a processor, memory, and one or more disks. An advantage of the array of independent storage devices is lower cost. The lower cost can result from mass production of the independent storage devices as commodity devices and from elimination of hot swappable features of the disk array. Another advantage is better scalability. The user can buy a few devices initially and add more devices as demand grows.
Replication and erasure coding have been explored as techniques for enhancing reliability for an array of independent storage devices. A replication technique employed by the array of independent storage devices replicates data blocks across a set of storage devices (e.g., three storage devices). This set is called the replica set for the data blocks. Erasure coding stores m data blocks and p parity blocks across a set of n storage devices, where n=m+p. For each set of m data blocks that is striped across a set of m storage devices, a set of p parity blocks is stored on a set of p storage devices.
If a high speed network couples the independent storage devices of a distributed storage system together, disk access latency can cause a significant delay when a client reads data. Memory provides a faster access latency than disk storage. If a replicated or erasure coded address space could be hashed to the storage devices, each memory could cache its portion of the replicated or erasure coded address space eliminating duplicates among the caches of different storage devices. More generally, it would be desirable to be able to hash an address space to a plurality of storage servers.
What is needed is a method of hashing an address space to a plurality of storage servers.