Known methods for network-based storage of data are block based and they typically rely on a Storage Area Network or SAN, i.e. a dedicated network that provides access to storage devices such as disks, tapes, optical jukeboxes, etc., and enables block-level operations on the stored data. Alternatively, Network-Attached Storage or NAS may be deployed, to remotely store data using file-based protocols instead of block-based protocols. The data are usually stored in a Redundant Array of Independent Disks (RAID), i.e. multiple disk drives that form part of a single logical unit amongst which the data are distributed depending on the desired redundancy level. Such single logical unit has a logical unit number or LUN. In existing systems, the SAN/NAS system provides several LUNs to the hypervisor, i.e. a piece of software, firmware or hardware that serves, i.e. creates, runs, monitors and manages the different virtual machines on a host machine, i.e. a server or computer. A virtual machine or guest machine is a software implementation of a machine or computer, typically comprising a single operating system and application programs running on that operating system. Usually, plural virtual machines share the hardware resources of a single host machine. At present, physical servers or computers are easily supporting 10 or more virtual machines. Each computer or server is running a hypervisor to serve the virtual machines it is hosting. The computer-implemented method according to the present invention works in close cooperation with such hypervisor.
At present, redundant storage of large volumes in cloud storage systems, typically over the internet, is slow. As a consequence, storage of databases or other large volumes is still organized locally, i.e. close to the clients. On the one hand, the internet as medium to connect to remote storage systems is slow. On the other hand, known mechanisms that enable redundancy like erasure coding are object driven or file driven as a consequence of which they further slowdown remote storage when applied to block based storage.
The problem of internet latency for cloud storage has been addressed in several prior art documents.
United States Patent Application US 2012/0047339 entitled “Redundant Array of Independent Clouds” describes a mechanism for reliable block based storage in remote cloud storage facilities. US 2012/0047339 recognizes in paragraphs [0004]-[0006] the problem of slow network based storage via the internet and the need for redundancy. US 2012/0047339 consequently proposes to divide data in multiple blocks, and to store the data blocks with different cloud providers. The data can be reconstructed through use of a translation map. In order to be able to reconstruct erroneous blocks, a parity block may be generated from the N data blocks. The parity block is stored with yet another cloud provider.
Although US 2012/0047339 no longer relies on a single cloud storage provider, storage of data remains slow, i.e. limited by the internet speed. In addition, retrieval of data remains dependent on the slowest responding cloud storage provider since the data must be reproduced from data blocks retrieved from the different cloud storage providers. Only when one of the cloud storage providers is not responding at all, the parity block will be retrieved from a third cloud storage provider where it is stored.
United States Patent Application US 2011/0296440 entitled “Accelerator System for Use with Secure Data Storage” describes a system and method for accelerating the processing of and secure cloud storage of data. From paragraphs [003]-[0007] of US 2011/0296440, it is learned that the offloading certain processing from the motherboard, i.e. the secure parsing of data, in order to accelerate storage and processing of data is key to the system described in this patent application.
In addition to internet latency, it is recognized that computers or servers are at present easily hosting 10 or more virtual machines each of which generates its own input/output (I/O) patterns. These randomized I/O patterns further degrade storage efficiency. A straightforward solution consists in adding storage resources to the backend storage systems but this solution increases the cost for storage.
Another disadvantage of existing network based storage techniques that rely on SAN (block based) or NAS (file or object based) lies in the fact that they are LUN specific. A logical unit is a single storage volume that is identified and addressed through its LUN or Logical Unit Number. In a virtualized environment where multiple virtual disks are running on a single logical unit, features like roll back, snapshotting or replicating a single virtual machine are difficult to implement. The only way to roll back a virtual machine lies in retrieving an older snapshot of that virtual machine, mounting the file system and copying the requested virtual disks back to the primary storage logical unit. This is a complex and slow process.
Furthermore, existing network based storage techniques are tied to specific storage hardware. As a consequence, replication between different storage providers remains difficult. Gateways have been developed to be installed between the storage systems and hypervisors. These gateways however only solve a small part of the problem. They are difficult to manage, require the storage to reside near the hypervisor and remain expensive.
United States Patent Application US 2010/0332401 entitled “Performing Data Storage Operations with a Cloud Storage Environment, Including Automatically Selecting Among Multiple Cloud Storage Sites” for instance describes a method for data storage and migration in a cloud environment. In an attempt to tackle the problem of internet latency and packet loss, a cloud storage gateway introduces local caching and de-duplication. As part of a block based data migration process, data stored in cache (local, primary copies) are moved to cloud storage systems (secondary copies). As illustrated by FIG. 17 and described in paragraphs [0278]-[0286], containerized de-duplication is foreseen to avoid creating unnecessary additional instances of the data within secondary storage, i.e. within the cloud.
It is an objective of the present invention to provide a solution to the above identified problems inherent to existing SAN/NAS based network storage techniques. More particularly, it is an objective of the present invention to disclose a method for layered storage of enterprise data that reduces the effect of internet latency, reduces the dependency on particular storage hardware, reduces the storage resource requirements in general while enabling features like zero-copying, snapshotting, cloning, thin provisioning, replicating, rollbacks, etc. of data at virtual machine level.