Computing systems are capable of generating vast amounts of data. Often, it is desirable to store this data in a data store for later access. The characteristics for the data store may vary based on the type of data which is stored. For example, in some situations the data is to be stored permanently, while in other situations the data expires and is removed from the data store after certain conditions are met. Further, the storage space used to store the data may remain constant or vary over time.
To facilitate the management of data, some administrators design data stores modularly, using a number of physical or logical storage containers. Then, if the size of the data increases, the administrator may add one or more containers to the data store to increase the available storage space. In that case, the administrator may optionally distribute the existing data across the new containers. On the other hand, if the size of the data decreases, the administrator might move data off one or more containers. This would provide the administrator with the option to remove the unused containers from the data store.
Modular data stores have made significant contributions to easing the administrative burden of storing large amounts of data. However, they do have limitations. For example, when data is distributed across multiple storage containers, it may be difficult to quickly locate for access or removal. To solve this problem, existing data stores may use a map to store a mapping of a full range of hash values which are distributed across the storage containers.
To store data in the data store, a hashing function is used to hash a known portion of the data and then the map is used to determine which container corresponds to the hash. The container corresponding to the hash stores the data that was used to generate the hash. To look up the data, the hashing function is used to hash the known portion again, and then the map is used to locate the container in which it is deterministically stored in accordance with the map. To add or remove a container to a hash-based data store, an administrator may generate a new map corresponding to the new set of containers. The administrator may then copy data among the containers if the new map causes a hash to map to a different container than it did previously.