Hash structures are used in computer systems to map identifying values, or keys, to their associated values or storage locations storing those values. A hash function is used to transform the key into the index of an array element where the associated value or pointer to that value is stored. When items in the hash structure are removed or deleted, the hash structure usually undergoes a rehash, whereby existing items in the hash structure are mapped to new locations. Hash structures can be used in cloud-based networks to arrange distributed storage schemes, in which mappings in the hash structure can contain or point to stored data objects, such as files that can be used by applications running in the cloud.
“Consistent hashing” can be implemented such that the addition or removal of one slot does not significantly change the mapping of keys to locations. In particular, consistent hashing involves associating a real angle to stored items to effectively map the item to, for example, a point on the circumference of a circle. In addition, available machines or servers are mapped to locations around the circle. The machine or server on which the item is to be stored is chosen by selecting the machine at the next highest angle along the circle after the item. If a storage location on the machine becomes unavailable, then the angles mapping to the location are removed and requests for files that would have mapped to the unavailable location are now mapped to the next available storage location.
However, consistent hashing can be problematic in situations in which a user or computer program requests to access a file that has been moved, renamed, or deleted without the user or computer program having the necessary data to determine the change. As such, finding and accessing the file can take more time and can lead to system hardware or software problems, faults, or other errors. Further, hashing in cloud-based networks can be problematic when multiple entities have access to move, rename, or delete files stored on devices of the cloud-based network.
In cases where a file has been altered in a distributed storage system using a consistent hash structure, attempts can be made to locate the file by traversing the last known links pointing to that file in the hash structure. When those attempts fail to locate the requested file, it becomes necessary to initiate a search for that file. In cases, the removal or insertion of storage servers may have altered or offset the position of links to the file in the hash structure. In the simplest case, the removal or insertion of a single storage server and/or other node in the hash structure can shift the location of the hash node corresponding to the file by one position. In that case, a check of the closest adjacent node (in the clockwise or other direction) can sometimes locate the missing file with little additional processing burden.
While unexpected or unrecorded node shifts of one position may sometimes be the most likely cause of a missing or misplaced file, in cases the checking of the next-closest node may not locate the requested file. In those cases, it becomes necessary to search the remaining (unchecked) nodes of the hash structure, such as the remaining nodes in a circular hash ring. In such instances, it is possible to search or probe all remaining nodes of the ring at one time, to be sure of locating the file (as long as it remains present in the hash structure and corresponding cloud storage resources). However, transmitting a significant number of simultaneous or near-simultaneous probes or search requests to all remaining nodes in a hash structure can impose significant communications and processing overhead.
In other cases, a user can choose to search the hash ring or other hash structure by traversing the ring one node at a time, proceeding for instance from the last node that was checked in a clockwise or counter-clockwise direction until the file is found (as long as it remains in the hash structure and corresponding cloud storage resources). However, traversing a hash ring or other hash structure one node at a time can involve performance penalties in terms of search lag, and in general can not be expected to locate the requested file in a search checking any less than ½ of the remaining nodes, on average.
Therefore, it may be desirable to provide systems and methods for searching a cloud-based distributed storage resources using a set of expandable probes, in which file searches or probes can proceed in a graduated manner, where, as each span checked by a current generation or set of the probes is determined not to contain the desired file, the next iteration of probes will increase in size or span, for instance exponentially.