1. Field of the Invention
The present invention generally relates to backup operations on data stored in a data recording device, and more particularly to backup operations on data stored in a data recording device or a data storage device such as a disk device.
2. Description of the Related Art
FIG. 1 shows an exemplary structure of a conventional backup system 100 to perform data backup operations, for example, to copy data from a disk apparatus to a backup apparatus such as a tape apparatus, in a virtual storage system environment.
Referring to FIG. 1, the backup system 100 includes a PC (Personal Computer) server 101, a physical disk apparatus 103, a server 105 and a backup apparatus 106. The PC server 101 includes a virtual disk control part 102 and an inner bus 104. The virtual disk control part 102 controls a virtual storage system in such a way that virtual disks can be associated with the physical disk apparatus 103 and be externally used as disk devices by the PC server 101. The server 105 also includes an inner bus 107 and is coupled to the backup apparatus 106. The PC server 101 is connected to the server 105 via a high-speed bus 110 such as a fiber channel (FC).
In such a system configuration, data recorded in disks in the physical disk apparatus 103 are backed up to the backup apparatus 106.
In accordance with a conventional backup method of backing up data recorded in a disk apparatus in a virtual storage system environment, first, a copy of a disk to be backed up is created in a virtual storage system layer. Then, data in the created copy disk are backed up, for example, to the backup apparatus 106 such as a tape apparatus. According to the conventional backup method, it is possible to shorten operation halt time of the disk during the data backup operation on the disk.
In the conventional backup method, in addition, the copy disk is fictitiously created to shorten time to create the copy disk. As a result, it is possible to further shorten the operation halt time during the data backup operation. In this case, however, it is required that the capacity of the copy disk be greater than or equal to that of a currently operating source disk.
On the other hand, there are types of data backup methods to back up data from the copy disk to the backup apparatus such as a tape apparatus, typically, file-by-file backup manner and all-block backup (or raw device backup) manner. In the all-block backup manner, data recorded in a disk are backed up into a backup apparatus from the head block to the last block of the disk sequentially in such a way that the backup data can have a block-wise one-to-one correspondence between the disk and the backup apparatus.
A detailed description is given, with reference to FIG. 2 through FIG. 6, of a virtual storage system, a fictitious disk creation method and a backup method according to the prior art.
First, a conventional virtual storage system is described.
FIGS. 2A through 2C are diagrams to explain an exemplary operation of a conventional virtual storage system.
In FIG. 2A, a virtual disk 210 of 500 GB (Giga Byte) is illustrated. In FIG. 2C, a physical disk 220 of 10 GB is illustrated. In FIG. 2B, an address conversion table and an allocation information table, which are collectively designated by the reference numeral 230, to indicate a correspondence between the virtual disk 210 and the physical disk 220 are illustrated.
In such a virtual storage system, a disk (virtual disk) is virtually created from a physical storage (physical disk), and the created virtual disk is provided to a server. The server can access the virtual disk as though the server accessed the physical disk.
In such a virtual storage system, a virtual disk is created as a virtual disk so that the virtual disk can have a capacity greater than that of a real physical storage (physical disk). Then, when a write request to the virtual disk is issued, a required recording area in the physical disk is allocated. This allocation is managed by using an allocation information table in the virtual storage system.
Thus, if there is a recording area in the virtual disk to which no write request has been issued, the recording area is not allocated to any recording area in the physical disk.
Next, an exemplary operation of a conventional virtual storage system to allocate a required recording area in a physical disk is described.
First, the virtual storage system creates a virtual disk 210 illustrated in FIG. 2A. It is noted that no data is written in the virtual disk 210 at this time.
Then, write accesses 211 and 212 to the virtual disk 210 are provided from a server. Required storage areas 221 and 222 in the physical disk 220 are not allocated to the write accesses 211 and 212, respectively, until such write requests to the virtual disk 210 are issued. This allocation is described in portions 231 and 232 of the allocation information table 230 so as to be manageable under the address conversion table and the allocation information table 230.
Subsequently, once the server accesses a logical address in the virtual disk 210 corresponding to an allocated area in the physical disk 220, the virtual storage system uses the address conversion table and the allocation information table 230 to perform an address conversion to access the physical disk 220.
Next, a fictitious disk copying method is described with reference to FIG. 3 and FIG. 4.
FIG. 3 is a diagram to explain a conventional fictitious disk copying method. FIGS. 4A and 4B are sequence diagrams of the fictitious disk copying method.
The fictitious disk copying method is defined as a method of fictitiously create a copy (copy disk) of a source disk in a short time. In response to an instruction to create a copy disk, a data block table is created for the source disk. In accordance with the fictitious disk copying method, it is considered that when the data block table is created, the copying operation has been completed. Such a data block table is defined as a table having information to indicate whether a block address of a copy disk and a corresponding block address of a source disk have been updated. Thus, at creation time of a data block table, the data block table includes no information to indicate whether an update operation has been performed.
In FIG. 3, a source disk 301 is illustrated as a currently working disk. In response to receipt of an instruction 310 to duplicate the source disk 301 from a server, the virtual storage system creates a data block table 302 for the source disk 301.
As shown in FIG. 4A, when such a copying instruction 310 is issued from the server during an operation period 401 of the source disk 301, the operation of the source disk 301 is temporarily halted in an time interval 402, and the data block table 302 is created for the source disk 301. Immediately after the creation of the data block table 302, the operation of the source disk 301 is restarted in an operation period 403. It is noted that a copy disk 303 as illustrated in FIG. 3 is not created at this time.
At the next step, the copying operation from the source disk 301 to the copy disk 303 is started. At the same time, a backup operation 404 from the copy disk 303 to a backup apparatus 304 is started. Fundamentally, individual data items in the source disk 301 are sequentially copied to the copy disk 303.
However, if an update request to update data in the source disk 301 is issued from a server, a block to be updated in the source disk 301 is copied to the copy disk 303 prior to updating of the source disk 301, and then update information to indicate that the update operation has been performed on the block to be updated is added in the data block table 302 of the source disk 301.
On the other hand, when a read request for backup data is issued, there may be a case where data have not been copied from the source disk 301 to the copy disk 303. In this case, the copying operation from the source disk 301 to the copy disk 303 is started.
According to the above-mentioned method, it is possible to fictitiously create a copy of a disk in a short time.
Finally, a conventional backup method is described with reference to FIG. 5 and FIG. 6.
As typical backup methods of backing up data in the above-mentioned copy disk to a backup apparatus such as a tape apparatus, there are two approaches: file-by-file backup method and all-data backup method (raw device level backup method). In accordance with the file-by-file backup method, only files stored in the copy disk are backed up to the backup apparatus. On the other hand, in accordance with the all-data backup method, all data of the copy disk are backed up to the backup apparatus regardless of file configuration and other factors.
FIG. 5 is a diagram to explain the file-by-file backup method and the all-data backup method.
In the file-by-file backup method, only actual data 502 stored in recording areas in a virtual disk 501 corresponding to a physical disk are backed up. As a result, it is possible to back up the actual data 502 into a backup medium 504 having a minimum capacity required for backup data 505 corresponding to the actual data 502. However, if the actual data 502 contain a large number of files, random accesses may occur frequently. Thus, in this case, a longer processing time would be required for the backup operation.
On the other hand, in the all-data backup method, since individual blocks in the copy disk are sequentially accessed from the head block thereof, it is possible to access the blocks at a high speed. However, an unused area 503 in the virtual disk is also backed up. As a result, even if the size of the actual data 502 is smaller than the total capacity of the virtual disk 501, it is necessary to prepare a large-sized backup medium 506 to store the backup data 507 having the same size as the total capacity of the virtual disk 501.
FIG. 6 shows an exemplary table for comparing the two methods with respect to processing time and backup capacity required to back up 100 GB disk. In the illustrated table, the column “DATA AMOUNT: SMALL” represents a case where the size of the actual data is 20 GB, and on other hand, the column “DATA AMOUNT: LARGE” represents a case where the size of the actual data is 80 GB. Also, it is supposed that the block access time of the file-by-file backup method is twice as large as that of the all-data backup method. Under such conditions, the two backup methods are compared with respect to the processing time and the required backup capacity. In the illustrated table, the processing time and the backup capacity for the all-data backup method are considered as 100s, respectively. Except for the processing time in the case of the large data amount, the all-data backup method has longer processing time and larger required backup capacity than the file-by-file backup method does, as illustrated in FIG. 6.
On the other hand, the file-by-file backup method has longer processing time in the case of the large data amount.
Japanese Laid-Open Patent Applications No. 62-026550 and No. 04-284549 disclose techniques related to the present invention.
FIG. 7 is a diagram to explain an object to be solved by the present invention.
As mentioned above, it is possible to shorten the operation halt time of a source disk during backup operation by fictitiously creating a copy of a disk. However, conventional methods have a defect in that the capacity of a copy disk 702, for example, even if only 10 GB data are recorded in a source disk 701, has to be larger than or equal to the total capacity of the source disk 701 such as 1 TB (Tera Bytes).
In addition, in accordance with the all-data backup method, the entire disk of 1 TB is backed up in the illustrated case regardless of amounts of actual data recorded in the source disk 701. Thus, unnecessary data are backed up, resulting in increases in processing time.
Furthermore, even if only 10 GB data are stored in the source disk 701, it is necessary to prepare a backup medium 703, for example, having a capacity larger than or equal to 1 TB, in accordance with conventional backup methods.
Although the all-data backup method and the file-by-file backup method can be selectively used depending on amounts of actual data recorded in the source disk 701, such selection may increase workloads in practice.