The business requirements for managing the lifecycle of application data have been traditionally met by deploying multiple point solutions, each of which addresses a part of the lifecycle. This has resulted in a complex and expensive infrastructure where multiple copies of data are created and moved multiple times to individual storage repositories. The adoption of server virtualization has become a catalyst for simple, agile and low-cost compute infrastructure. This has led to larger deployments of virtual hosts and storage, further exacerbating the gap between the emerging compute models and the current data management implementations.
Applications that provide business services depend on storage of their data at various stages of its lifecycle. FIG. 1 shows a typical set of data management operations that would be applied to the data of an application such as a database underlying a business service such as payroll management. In order to provide a business service, application 102 requires primary data storage 122 with some contracted level of reliability and availability.
Backups 104 are made to guard against corruption or the primary data storage through hardware or software failure or human error. Typically backups may be made daily or weekly to local disk or tape 124, and moved less frequently (weekly or monthly) to a remote physically secure location 125.
Concurrent development and test 106 of new applications based on the same database requires a development team to have access to another copy of the data 126. Such a snapshot might be made weekly, depending on development schedules.
Compliance with legal or voluntary policies 108 may require that some data be retained for safely future access for some number of years; usually data is copied regularly (say, monthly) to a long-term archiving system 128.
Disaster Recovery services 110 guard against catastrophic loss of data if systems providing primary business services fail due to some physical disaster. Primary data is copied 130 to a physically distinct location as frequently as is feasible given other constraints (such as cost). In the event of a disaster the primary site can be reconstructed and data moved back from the safe copy.
Business Continuity services 112 provide a facility for ensuring continued business services should the primary site become compromised. Usually this requires a hot copy 132 of the primary data that is in near-lockstep with the primary data, as well as duplicate systems and applications and mechanisms for switching incoming requests to the Business Continuity servers.
Thus, data management is currently a collection of point applications managing the different parts of the lifecycle. This has been an artifact of evolution of data management solutions over the last two decades.
Current Data Management architecture and implementations such as described above involve multiple applications addressing different parts of data lifecycle management, all of them performing certain common functions: (a) make a copy of application data (the frequency of this action is commonly termed the Recovery Point Objective (RPO)), (b) store the copy of data in an exclusive storage repository, typically in a proprietary format, and (c) retain the copy for certain duration, measured as Retention Time. A primary difference in each of the point solutions is in the frequency of the RPO, the Retention Time, and the characteristics of the individual storage repositories used, including capacity, cost and geographic location.
In a series of prior patent applications, e.g., U.S. Ser. No. 12/947,375, a system and method for managing data has been presented that uses Data Management Virtualization. Data Management activities, such as Backup, Replication and Archiving are virtualized in that they do not have to be configured and run individually and separately. Instead, the user defines their business requirement with regard to the lifecycle of the data, and the Data Management Virtualization System performs these operations automatically. A snapshot is taken from primary storage to secondary storage; this snapshot is then used for a backup operation to other secondary storage. Essentially an arbitrary number of these backups may be made, providing a level of data protection specified by a Service Level Agreement.