Entities often generate and use data that is important in some way to their operations. This data can include, for example, business data, financial data, and personnel data. If this data were lost or compromised, the entity may realize significant adverse financial and other consequences. Accordingly, many entities have chosen to back up some or all of their data so that in the event of a natural disaster, unauthorized access, or other events, the entity can recover any data that was compromised or lost, and then restore that data to one or more locations, machines, and/or environments.
Increasingly, entities have chosen to back up their important data using cloud based storage. The cloud based approach to backup has proven attractive because it can reduce, or eliminate, the need for the entity to purchase and maintain its own backup hardware. Cloud based storage is also flexible in that it can enable users anywhere in the world to access the data stored in the cloud datacenter. As well, the user data is protected from a disaster at the user location because the user data is stored in the cloud data center, rather than on backup hardware at the user location.
While advantageous in certain regards, the use of cloud based storage has introduced some new problems however. For example, some cloud based storage systems and services require that a user download an entire file from the datacenter to the local user machine before the user can fully access that file. Depending upon the size of the file and the capacity of the communication line connecting the user with the datacenter, this process can be unacceptably long.
To illustrate with an example, in order for a user to attempt to locally start up a 100 Gb GB virtual machine (VM), and assuming a T3 line (having a capacity of about 44 Mbps) connecting the user with the datacenter where the VM is stored, the user would have to wait approximately 5 hours for the VM to be transferred from the datacenter to the local user machine over the T3 line. In many circumstances, such a transfer time is unacceptably long.
A related consideration is that while the entire file is downloaded in circumstances such as those noted above, it may be the case that the user does not need the entire file. To continue with the previous example, it can be the case that only about 500 Mb of data is needed to boot a Windows® host. Thus, over 99 percent of the data that was transferred to the user in this example was not actually needed by the user to boot the VM, and the download time for only the data actually required to boot the VM could be reduced to about 90 seconds. This example accordingly demonstrates that bandwidth is unnecessarily consumed by transferring data that is not needed by the user. Thus, not only is user access to datacenter files significantly slowed, but the data transfer bandwidth of the system is inefficiently utilized.
In light of problems and shortcomings such as those noted above, it would be useful to enable a user to directly access data stored at a datacenter, rather than having to first download the data locally in order to be able to access that data. More particularly, it would be useful to be able to identify and download only the data actually needed by a user who has requested access to data stored at a datacenter. It would also be useful to be able to represent datacenter data to a user in such a way that the datacenter data appears to the user as a local file system, for example, on the user machine. Further, it would be useful to be able to reassemble content out of a plurality of incremental backups and/or full backups to form a virtual synthetic that can be made available to a user at a client system. Finally, it would be useful to be able to provide these functions, among others, in a variety of scenarios and use cases, examples of which include disaster recovery, and live access to databases, email repositories, and other data sources.