Computer data is vital to today's organizations, and a significant part of protection against disasters is focused on data protection. As solid-state memory has advanced to the point where cost of memory has become a relatively insignificant factor, organizations can afford to operate with systems that store and process terabytes of data.
Computer systems may include different resources used by one or more host processors. Resources and host processors in a computer system may be interconnected by one or more communication connections. These resources may include, for example, data storage devices such as those included in the data storage systems manufactured by EMC Corporation. These data storage systems may be coupled to one or more servers or host processors and provide storage services to each host processor. Multiple data storage systems from one or more different vendors may be connected and may provide common data storage for one or more host processors in a computer system.
A host processor may perform a variety of data processing tasks and operations using the data storage system. For example, a host processor may perform basic system I/O operations in connection with data requests, such as data read and write operations.
Host processor systems may store and retrieve data using a storage device containing a plurality of host interface units, disk drives, and disk interface units. Such storage devices are provided, for example, by EMC Corporation of Hopkinton, Mass. and disclosed in U.S. Pat. No. 5,206,939 to Yanai et al., U.S. Pat. No. 5,778,394 to Galtzur et al., U.S. Pat. No. 5,845,147 to Vishlitzky et al., and U.S. Pat. No. 5,857,208 to Ofek. The host systems access the storage device through a plurality of channels provided therewith. Host systems provide data and access control information through the channels to the storage device and storage device provides data to the host systems also through the channels. The host systems do not address the disk drives of the storage device directly, but rather, access what appears to the host systems as a plurality of logical disk units, logical devices or logical volumes. The logical disk units may or may not correspond to the actual physical disk drives. Allowing multiple host systems to access the single storage device unit allows the host systems to share data stored therein. In a common implementation, a Storage Area Network (SAN) is used to connect computing devices with a large number of storage devices. Management and modeling programs may be used to manage these complex computing environments.
Two components having connectivity to one another, such as a host and a data storage system, may communicate using a communication connection. In one arrangement, the data storage system and the host may reside at the same physical site or location. Techniques exist for providing a remote mirror or copy of a device of the local data storage system so that a copy of data from one or more devices of the local data storage system may be stored on a second remote data storage system. Such remote copies of data may be desired so that, in the event of a disaster or other event causing the local data storage system to be unavailable, operations may continue using the remote mirror or copy.
In another arrangement, the host may communicate with a virtualized storage pool of one or more data storage systems. In this arrangement, the host may issue a command, for example, to write to a device of the virtualized storage pool. In some existing systems, processing may be performed by a front end component of a first data storage system of the pool to further forward or direct the command to another data storage system of the pool. Such processing may be performed when the receiving first data storage system does not include the device to which the command is directed. The first data storage system may direct the command to another data storage system of the pool which includes the device. The front end component may be a host adapter of the first receiving data storage system which receives commands from the host.
Often cloud computing may be performed with a data storage system. As it is generally known, “cloud computing” typically refers to the use of remotely hosted resources to provide services to customers over one or more networks such as the Internet. Resources made available to customers are typically virtualized and dynamically scalable. Cloud computing services may include any specific type of application. Some cloud computing services are, for example, provided to customers through client software such as a Web browser. The software and data used to support cloud computing services are located on remote servers owned by a cloud computing service provider. Customers consuming services offered through a cloud computing platform need not own the physical infrastructure hosting the actual service, and may accordingly avoid capital expenditure on hardware systems by paying only for the service resources they use, and/or a subscription fee. From a service provider's standpoint, the sharing of computing resources across multiple customers (aka “tenants”) improves resource utilization. Use of the cloud computing service model has been growing due to the increasing availability of high bandwidth communication, making it possible to obtain response times from remotely hosted cloud-based services similar to those of services that are locally hosted.
Cloud computing infrastructures often use virtual machines to provide services to customers. A virtual machine is a completely software-based implementation of a computer system that executes programs like an actual computer system. One or more virtual machines may be used to provide a service to a given customer, with additional virtual machines being dynamically instantiated and/or allocated as customers are added and/or existing customer requirements change. Each virtual machine may represent all the components of a complete system to the program code running on it, including virtualized representations of processors, memory, networking, storage and/or BIOS (Basic Input/Output System). Virtual machines can accordingly run unmodified application processes and/or operating systems. Program code running on a given virtual machine executes using only virtual resources and abstractions dedicated to that virtual machine. As a result of such “encapsulation,” a program running in one virtual machine is completely isolated from programs running on other virtual machines, even though the other virtual machines may be running on the same underlying hardware. In the context of cloud computing, customer-specific virtual machines can therefore be employed to provide secure and reliable separation of code and data used to deliver services to different customers.
Conventional data protection systems include tape backup drives, for storing organizational production site data on a periodic basis. Such systems suffer from several drawbacks. First, they require a system shutdown or cause a system state degradation during backup, since the data being backed up cannot be used during the backup operation. Second, they limit the points in time to which the production site can recover. For example, if data is backed up on a daily basis, there may be several hours of lost data in the event of a disaster. Third, the data recovery process itself takes a long time.
Another conventional data protection system uses data replication, by creating a copy of the organization's production site data on a secondary backup storage system, and updating the backup with changes. The backup storage system may be situated in the same physical location as the production storage system, or in a physically remote location. Data replication systems generally operate either at the application level, at the file system level, at the hypervisor level or at the data block level.
Current data protection systems try to provide continuous data protection, which enable the organization to roll back to any specified point in time within a recent history. Continuous data protection systems aim to satisfy two conflicting objectives, as best as possible; namely, (i) minimize the down time, in which the organization production site data is unavailable, during a recovery, and (ii) enable recovery as close as possible to any specified point in time within a recent history.
Continuous data protection typically uses a technology referred to as “journaling,” whereby a log is kept of changes made to the backup storage. During a recovery, the journal entries serve as successive “undo” information, enabling rollback of the backup storage to previous points in time. Journaling was first implemented in database systems, and was later extended to broader data protection.
One challenge to continuous data protection is the ability of a backup site to keep pace with the data transactions of a production site, without slowing down the production site. The overhead of journaling inherently requires several data transactions at the backup site for each data transaction at the production site. As such, when data transactions occur at a high rate at the production site, the backup site may not be able to finish backing up one data transaction before the next production site data transaction occurs. If the production site is not forced to slow down, then necessarily a backlog of un-logged data transactions may build up at the backup site. Without being able to satisfactorily adapt dynamically to changing data transaction rates, a continuous data protection system chokes and eventually forces the production site to run in an unprotected state until such a time where it can recover from the congestion.