This invention relates generally to networked data storage and more particularly to garbage collection following a crash of a cloud gateway or cloud storage.
Processing systems are increasingly using cloud storage for storing persistent backup copies of data as file system objects. Typically, a processing system, such as a transactional processing system of an organization, is connected to a cloud gateway comprising a server computer executing a cloud storage application, and having memory and local storage. The cloud gateway interfaces via a network, such as the Internet, to remote “cloud” storage which may comprise distributed storage or a remote data center, for example, and manages the storage and retrieval of processing system backup data onto a file system created on the cloud storage. Generally, the cloud gateway stores backup data of the organization's processing system on the cloud file system by creating data chunks of the backup data, de-duplicating chunks with previous backups, and storing the chunks not already present on the file system. The cloud gateway creates a chunk object for each data chunk, and the object is named with a unique fingerprint of the chunk. Thus, each chunk stored in cloud storage may be accessed by its fingerprint name, and may correspond to some range (start offset, length) of backups. A chunk can correspond to multiple ranges of multiple backups due to de-duplication. The cloud gateway also creates a manifest file that describes the various chunks that comprise a particular backup. The manifest file comprises a chunk map, i.e., a listing of all chunks that are present in a backup image in their order of appearance in the backup image, identifying the chunks by their chunk fingerprints. When recovering a backup, the chunk map is consulted and chunks corresponding to the identified fingerprint names in the manifest file are retrieved and restored in their order of their listing.
If while storing backups onto the cloud file system a cloud gateway or cloud file system were to crash, there would be an inconsistency between the manifest files and the data chunks referred to by the manifest files. This inconsistency can arise since the manifest files may not be stored to cloud storage at the same time the data chunks are stored. As a result, upon a crash occurring, there may be data chunks in cloud storage which are not referenced by any manifest files because the cloud gateway or cloud file system crashed after the data chunk was stored but before the manifest file could be stored. It may also happen that certain manifest files are stored in cloud storage before their corresponding data chunks are stored. In this event, the cloud storage would have manifest files following a crash that refer to data chunks which may or may not be present in cloud storage. In order to return the cloud storage to a state of consistency, a garbage collection process must be performed after a crash.
A conventional garbage collection process iterates through all manifest files that are in cloud storage (there may be thousands of such files) and identifies and lists all data chunks that referred to in each of the manifest files. Next, all data chunks that exist in the cloud storage must be identified and listed. Frequently, these may number in the millions. The identified and listed data chunks must then be correlated against the data chunks present in a manifest file. Any manifest file that refers to a data chunk that is not present in cloud storage must be removed. Additionally, data chunks that are in cloud storage but are not referenced by any manifest file must also be removed. This garbage collection process can be very time-consuming and resource intensive. It may take hours, or days, or even weeks to clean up the cloud storage depending upon the number of manifest files and data chunks that must be scanned, as well as the cloud storage latency.
It is desirable to provide systems and methods that address the foregoing and other known problems of managing cloud storage by enabling the fast, efficient identification and restoration of backup data objects in cloud storage following a crash without the necessity of standard garbage collection. It is to these ends that the present invention is directed.