Currently the standard way to stage-in data to a cloud virtual machine is to store the data to the cloud provider's storage system and then have code running on the virtual machine using the cloud provider's software development kit to download the data to the virtual machine. Accessing the cloud storage happens in a client/server fashion.
In a lot of scenarios for processing data in a data center, the same data needs to be present on more than one virtual machine. Typical use cases are embarrassingly parallel workloads, since they can be horizontally scaled to be processed on multiple virtual machines. Examples of this type of workloads are simulations, in particular Monte Carlo simulations which are common in various scientific disciplines. The simulation often has to be performed on a huge parameter space, which leads to lots of independent simulations. Each independent simulation run can be executed on a separate virtual machine to achieve near linear to linear parallelization. The main overhead of this approach is that common data items that are needed for each simulation need to be present on all virtual machines. Since the virtual machines act independently of each other, the same data items are downloaded by all virtual machines. This way there are multiple redundant transmissions of possibly huge data files (in the dimension of terabytes), e.g. for biological or meteorological data. From an efficiency perspective, eliminating these redundant transmissions leads to higher performance and lower costs for staging in data.
From a privacy perspective, even though cloud storage services are usually protected by strong security measures, many potential cloud users have concerns storing their data permanently in a cloud storage. They would rather prefer to stage-in their data directly from their on-premises infrastructure to their virtual machines in the cloud, without temporarily storing the data in the cloud storage.
The embodiments described below are not limited to implementations which solve any or all of the problems mentioned above.