Hash structures are used in computer systems to map identifying values, or keys, to their associated values or storage locations storing those values. A hash function is used to transform the key into the index of an array element where the associated value or pointer to that value is stored. When items in the hash structure are removed or deleted, the hash structure usually undergoes a rehash, whereby existing items in the hash structure are mapped to new locations. Hash structures can be used in cloud-based networks to arrange distributed storage schemes, in which mappings in the hash structure can contain or point to stored data objects, such as files that can be used by applications running in the cloud.
“Consistent hashing” can be implemented such that the addition or removal of one slot does not significantly change the mapping of keys to locations. In particular, consistent hashing involves associating a real angle to stored items to effectively map the item to, for example, a point on the circumference of a circle. In addition, available machines or servers are mapped to locations around the circle. The machine or server on which the item is to be stored is chosen by selecting the machine at the next highest angle along the circle after the item. If a storage location on the machine becomes unavailable, then the angles mapping to the location are removed and requests for files or other data objects that would have mapped to the unavailable location are now mapped to the next available storage location.
In a large-scale cloud-based networks or other distributed networks, various choices are available to the systems designer in terms of storage architecture, including where to place data objects, the directory or other logical storage structure for those objects, the format to be used for those objects, and the number of copies or other replication policy to use for those objects. One possible choice for storage implementation is to copy or “stripe” copies of all data objects to all possible servers or other storage resources, or to a substantial portion of them. Wide-scale striping however can incur performance and reliability penalties, including when a significant number of users are attempting to access those files or other data objects at the same or different times. At the other end of the architectural spectrum, a systems designer could also choose to place data objects into just one location for each object. This choice, while eliminating processing overhead needed to seek and extract a given data object, however, also eliminates helpful data redundancy and can lead to contention between users requesting the file or other object.
In the case of cloud-based networks, it may at times therefore be desirable to store more than one copy of a data object in the data storage resources of the cloud, but at the same time avoid implementing wide-scale striping to the cloud. It may be useful to maintain a relatively small or discrete number of copies of a given data object for more than one reason—data redundancy or backup in the face of possible storage failures being one. In addition, in a cloud-based network, more than one user or application may wish to access the same file or other data object at the same or different times, and serving files to requesting users may incur fewer bottlenecks when a discrete or intermediate number of sources of the data object is available.
When implementing a storage scenario where a file or other data object is stored on a comparatively small scale, for example, two to five copies of the file or other object, it would be possible to establish and encode that comparatively smaller set of copies of the file using a hash structure. In some cases, the hash structure nodes storing those objects or links to their locations can be spread around the hash structure, for instance based on attributes of the data objects and/or randomized offsets to separate different copies. When data objects are hashed and stored in this manner, each individual data object can be distributed randomly or variously across the hash structure and/or underlying storage resources. This may contribute to better data redundancy, among other things.
However, in cases two data objects, such as files, can bear a relationship to each other which is lost in the distribution of those objects to scattered hash locations. For instance, files which are contained in or descend from the same parent directory in a directory file structure can be encoded and stored in nodes or locations entirely separate from those for the parent directory itself. When a user wishes to perform various common file processing tasks such as, for instance, to read all files within a (common) parent directory, write to those files, or search those files, the hash management logic or platform is forced to locate and extract those files from an entire series of unrelated locations or sources. Those sources can be or include separate or remote storage servers or databases, each of which has to be looked up, navigated to and accessed to scan the files in the parent directory. This can impose significant performance penalties for these common file operations.
It may therefore be desirable to provide systems and methods for a cloud-based directory system based on hashed values of parent and child storage locations, in which files or other data objects are each stored in a normally-hashed location, as well as consistently inserted and stored to the hash node of their parent directory or other location to establish common visibility to read, write, and other operations.