Not applicable.
This invention relates generally to backup storage systems and more particularly to a system and method for efficiently mapping information from a primary processing system to a backup storage system.
As is known in the art, computer systems which process and store large amounts of data typically include a primary processor coupled to a shared storage system in which the data is stored. The primary processor performs its operations using the storage system. To minimize the chance of data loss, the computer systems also can include a backup storage system coupled to the primary processor and the storage system. Often the connection between the primary processor and the backup storage system is through a network in which case the primary processor is sometimes referred to as a xe2x80x9cclient processorxe2x80x9d or more simply a xe2x80x9cclient.xe2x80x9d
The backup storage system can include a backup storage device (which may include disk drives, tape storage or any other storage mechanism), together with a system for placing data into the storage device and recovering the data from that storage device. To perform a backup, the client copies data from the shared storage system across the network to the backup storage system. Thus, an actual data file may be communicated over the network to the backup storage device.
The shared storage system corresponds to the actual physical storage. For the client to write the backup data over the network to the backup storage system, the client first converts the backup data into file dataxe2x80x94i.e., the client retrieves the data from the physical storage system level, and converts the data into application level format (e.g. a file) through a logical volume manager level, a file system level and the application level. When the backup storage device receives the data file, the backup storage system can take the application level data file, and convert it to its appropriate file system level format for the backup storage system. The data can then be converted through the logical volume manager level and into physical storage.
This form of backing up data may be referred to as xe2x80x9clogical-logicalxe2x80x9d backup. That is, the logical data is backed up on the backup storage device. The data to be backed up is presented independent of the manner in which it is physically stored on the shared storage system at the physical storage system level, independent of the file system level mechanisms on the client, and independent of how data is stored on the backup storage device.
Logical-logical backup can be a particularly convenient form of backup. The backup storage system, however, may need to be capable of interaction with a variety of different clients. For example, it may be desirable to have a backup storage system that can backup data from both a Solaris operating system and HP-UX operating system. By providing logical level data across the network to the backup storage system, the backup storage system can take that data and convert the data location as appropriate for storage in its own physical storage system (i.e., take the data at the application level and convert it to the file system level, logical volume manager and physical storage system, as each of those levels is possibly uniquely implemented in the backup storage system). Thus, if the client is running the HP-UX operating system and the backup storage system is using a Solaris operating system, the backup storage system can save and retrieve data according to the formats used in the Solaris operating system. If data were copied directly from the storage system to the backup storage system, the stored data would depend on the file system level formats for the client (here, HP-UX). The backup storage system would then be unable to read or examine that data because it uses different file system level formats (here, Solaris).
One problem with such a backup system, however, is that the backup process requires significant traffic on the network. Moreover, if the,network medium is crowded with traffic or can not support high speed data transfer, the backup process can be slow. In addition, this architecture may require significant resources of the client or the storage system. The client and the storage system must fully parse the data from the physical storage level to the application level. Thus, this approach requires a relatively large amount of system resources and time to complete transfers. This is particularly true when there is a large amount of data to back up.
The EMC Data Manager (EDM) line of products is capable of logical-logical backup and restore over a network, as described in numerous publications available from EMC, including the EDM User Guide (Network) xe2x80x9cBasic EDM Product Manualxe2x80x9d.
To overcome the above problems, a backup storage architecture in which a direct connection is established between the shared storage system and the backup storage system was conceived. Such a system is described in U.S. Pat. No. 6,047,294, assigned to assignee of the present invention, and entitled Logical Restore from a Physical Backup in Computer Storage System. In this approach, the backup storage system may be a system as generally described in EMC Data Manager: Symmetrix Connect User Guide, P/N 200-113-591, Rev. C, December 1997, available from EMC Corporation of Hopkinton, Mass. The direct connection between the shared storage system and the backup storage system may be provided as a high speed data channel, such as a SCSI cable or one or more fiber-channel cables. In this system, a user may be permitted to backup data over the network or the direct connection.
When the shared storage system is a Symmetrix product, the direct connection may be connected from the backup storage system to the storage system through a host adaptor. For high speed data transfer using the direct connection approach, data may be copied directly from physical storage devices to the backup storage system.
The shared storage system can be provided from a plurality of different physical storage devices. Each of the respective physical devices can include contiguous segments of storage which must be backed up. These contiguous segments of storage may, but need not, be of the same size. The segments of storage are sometimes referred to as xe2x80x9chyper-volumes.xe2x80x9d Thus, hyper volumes correspond to segments of physical storage that can be used as components when constructing a virtual volume for use by the file system
With respect to a back-up scenario, however, the task remains of determining which hyper-volumes (or portions of hyper-volumes) should be backed up and communicating this information to the backup storage device. For example, consider backup for data base files in an Oracle data base application (such as Oracle versions 7.2.x, 7.3.x and 8.0.4, available from Oracle Corp. of Redwood, Calif.). Files may be stored on any one, or across any number, of hyper-volumes located in any particular set of locations across one or more physical storage devices.
To provide an approach which allows mapping of individual files, a technique was conceived in which a structure representing each contiguous piece of disk storage was passed between the client and the backup storage system. Thus, in the prior art approach to backing up data, each 128 kilobytes (KB) of storage memory in the shared storage system is represented by a block of memory (or more simply xe2x80x9ca blockxe2x80x9d), a beginning offset value and a length. The beginning offset value and length define a so-called extent.
In any particular system, each block is typically of the same predetermined size. Different systems, however, can utilize different size blocks. The particular size of a block used in any particular system is selected in accordance with the requirements and capabilities of that system. In one system, for example, it may be preferable to utilize a block size of one-hundred bytes while in a different system, it may be it may be preferable to utilize a block size of two-hundred bytes. One problem with this approach is that as the amount of data to be backed-up grows in size, there is a concomitant increase in the size of the mapping information. Thus, the size of the mapping information can itself grow to a size which makes it difficult to deal with.
For example, assuming 64 gigabytes (GB) of data in a striped file system must be backed-up and that a single 100 byte (B) block of memory represents (or xe2x80x9cmapsxe2x80x9d) each 128 kilobytes (KB) of disk space memory. Then the amount of memory needed to represent the 64 GB of memory can be computed as (64 GB/128 KB)*100 B or 50 megabytes (MB). Thus, 50 MB of data is required to represent the data to be backed-up and this 50 MB of data must be transferred from the client to the back-up storage system.
Part of the reason so much memory is needed to represent the data to be backed-up is due to the properties of a striped file system. In particular, one property of a striped file system is that the data is logically distributed across a number of disk hyper-volumes. Thus, striped file systems tend to not have large contiguous physical segments of memory. Rather there are a relatively large number of smaller memory segments over which the data to be backed up is distributed. Since the prior art approach utilizes a single block to represent each contiguous memory segment, regardless of the size of the contiguous memory segment, a relatively large amount of memory is required to represent data stored in a striped file system.
Thus, one problem with the conventional approach is that it is expensive in terms of memory (i.e. a relatively large amount of data is needed for the file mapping). Furthermore, the prior art approach does not scale well since an increase in the amount of data to back-up results in a concomitant increase in the amount of data required to map the file. Moreover, the prior art xe2x80x9cone block per extentxe2x80x9d approach results in a relatively large amount of processing resources required to communicate the file mapping data between the client and the back-up storage device. Further still, since the filing mapping data is relatively large, a relatively large amount of time is required to transmit the data from the client to the back-up storage system.
It would, therefore, be desirable to provide a technique for efficiently representing or xe2x80x9cmappingxe2x80x9d data so that it can be rapidly communicated between a primary processor and a back-up storage system while at the same time allowing a backup system to backup files rather than devices.
In accordance with the present invention, a technique for representing files in a striped file system comprises generating one or more extent blocks each of which defines a logical volume, an offset value and a length and generating a repetition block which defines a number of extent blocks and a number of repetitions. With this particular arrangement, a technique for efficiently representing data so that it can be communicated from a client to a back-up storage system is provided. The extent and repetition blocks define storage segments in which data to be backed-up is stored. By utilizing extent and repetition blocks to represent the storage segments requiring backup, a compact representation is provided. Since the storage segments to be back-up are represented compactly, a relatively small amount of data is needed for the file mapping. Also, this approach scales well since increasing the amount of data to be backed-up does not necessarily result in a concomitant increase in the amount of data needed for the file mapping. Furthermore, since the data needed for the file mapping is compact, fewer processing resources are required to communicate the file mapping data between the client and the back-up storage device. Further still, since the file mapping data is compact, a relatively small amount of time is required to transmit the data from the client to the back-up storage system. To perform the backup file mapping, it is first necessary to obtain a physical device mapping and a logical file mapping. Using these two inputs, it is then possible to perform the backup file mapping.
A method of backing up data in a system comprising a primary processor, a shared storage device and a backup storage device comprises, performing a discovery process which includes the step of identifying files that are required to be backed up and determining a file type for each of the identified files. In response to a file corresponding to one of a physical, concatenated or striped file type, a corresponding one of physical, concatenated or striped storage processing is performed with each of the processes utilizing an extent block representation. With this particular arrangement a technique which facilitate the mapping of a logical object to hypervolumes and extents is provided The technique provides the mapping by obtaining the logical/physical volume information through logical volumes or for physical devices, as appropriate, and then obtaining the logical file information. Given this data, the mapping system can construct its own mappings between the logical file extents and the LVM mappings to hypervolumes. By first determining the storage and logical information for the object which is going to be mapped and determining the type of object for which extent information is being found, an appropriate process dependent upon the file type can be used to generate extent and, if needed, repetition blocks. This results in a backup technique which does not require a lot of memory since a compact representation of the data to be backed up is used. Also, this results in a backup technique which is relatively rapid compared with prior art approaches.