As is known in the art, there exists a variety of systems for backing up computer data that can be subsequently restored. In general, such backup and restore systems are used to backup data from a plurality of computers or clients connected to one or more networks. A user, such as a system administrator, can restore selected portions of the previously backed up data to a desired client computer. In this manner, the loss of data which can be contained in large databases can be reduced and in some instances, even prevented.
In some systems, data used by one or more clients is first stored on a primary storage system such as a Symmetrix storage system provided by EMC Corporation of Hopkinton, Mass. In such systems, the purpose of a backup and restore system such as a Fastrax storage system also provided by EMC Corporation of Hopkinton, Mass. is to backup to long term storage devices the data that belongs to the client but which is stored on the primary storage system. For example, data stored on the Symmetrix system (i.e. the primary storage system) is backed up through the Fastrax system (i.e. the backup storage system) to long term storage coupled to or provided as part of the Fastrax system. The long term storage may be provided, for example, as disk drives, tape storage or any other storage mechanism.
The data must be backed up in manner which allows the data to be subsequently restored from the long term storage (e.g. the tape drives) to the primary storage system (e.g. the Symmetrix system) and the client. The backup and restore system is thus sometimes considered to include long term storage together with a system for placing data into the long term storage device and recovering the data from the long term storage device.
To perform a backup, the client copies data from the primary storage system to the backup and restore system. Similarly, to perform a restore, the backup and restore system copies data back to the primary storage device. Thus, during backup and restore operations actual data files are communicated between a host (e.g. the client and or primary storage system) to the backup and restore system.
Primary storage systems such as the Symmetrix system typically comprise a plurality of disks (i.e. an array of disks) and the data is stored on sections of the disks. The sections of the disks are referred to as “extents” (i.e. an extent corresponds to a small portion or piece of a disk). The data in the primary storage system can thus be typically specified in the form of extents.
To implement a backup operation through the backup and restore system, the client or the primary storage system must specify to the backup and restore system the particular data which should be backed up. Typically, only a portion of the data on the primary storage system (rather than all of the data stored on the primary storage system) must be backed up at any one time. Thus, to specify to the data which should be backed up, the client or primary storage system provides a list of extents to the backup and restore system.
The extents can lie across many disks of the disk array (i.e. the extents are typically scattered around the disks) and the number of extents which must be specified can get very large. The number of extents which backup and restore systems can receive, however, is limited. That is, the backup and restore systems (e.g. the Fastrax system) limit the number of extents which a client or primary storage system (e.g. the Symmetrix system) can specify during any single backup operation. In some cases, however, the number of extents which must be specified by a host (e.g. the client or the primary storage system) to the backup and restore system can be very large and sometimes can even exceed the number of extents which the backup and restore system can handle.
For example, assume the backup and restore system can handle only 1000 extents at one time. The prior art approach to solving this problem is to send no more than 1000 extents at a time to the backup system. When the first 1000 extents are processed the next 1000 are sent and so on until all extents have been processed. This solution is relatively time consuming and also consumes a relatively large amount of system resources since extents are continuously transferred between the host and the backup and restore system. If the number of extents which the backup and restore system can handle is exceeded, the system fails to complete, or in some case to even perform, the backup operation.
This problem is exacerbated somewhat when the primary storage system stores data with a so-called striping technique. In a striping technique, data from a single file is stored such that the data is spread across multiple disks in a predetermined pattern. This results in a storage pattern in which data is stored on disks using a relatively large number of extents, each of which is relatively small in size.
Another problem is that the host must collect and transmit (or otherwise provide) the information to the backup and restore system. In a worst case scenario the host transmits information on each extent separately. If a relatively large number of extents are specified (but within the limits of the backup and restore system) this process is relatively time consuming and also consumes resources which would otherwise be available for data processing tasks because of the large number of extents. Also, once all of the data to be backed up had been specified to the backup and restore system, a relatively large amount of time is required to transfer the data from the primary storage system to the backup and restore system.
A further problem is that when the number of extents is large, a large amount of data must be used to represent the extents being backed up. Thus, a relatively large amount of memory resources are required.
Still another problem arises when it is time for the system to perform a restore operation. It should be appreciated that during a backup it is only necessary to instruct the backup and restore system to take the extent as a backup. In a restore operation, however, it is necessary to specify how the data was backed up and how it should be restored. Thus, the host must specify the mapping used during the backup process (i.e. the backup data which is now on tape) and must also specify how to restore the data using a new mapping.
It would, therefore, be desirable to provide a technique for efficiently representing data so that it can be efficiently and rapidly communicated between a host and a backup and restore system during backup and restore operations.