1. Field of the Invention
The present invention relates to a method and system for backing up data from a plurality of disks, and particularly to restoring the data for individual files and/or folders from a backup tape using minimal map information.
1. Related Art
Backing up data from one or more computer disks is typically performed to recover from inadvertent user deletions/overwrites or from disk hardware failure. In the case of inadvertent user deletions/overwrites, only the data corresponding to the destroyed data is copied to the original disk from the backup. In the case of disk hardware failure, the user can restore all files to the original disk from the most recent backup. In most computer systems, the backup device is a tape drive, which can accommodate large amounts of data at a relatively low cost per byte of storage.
Generally, conventional backup methods provide for either file-by-file backup or image backup. In a file-by-file backup, the backup program copies one file at a time from the disk to the tape. Specifically, the program places all pieces of data for each file, irrespective of actual locations on the disk, into a single sequential block that is stored on the tape. Thus, a file-by-file backup can easily provide an incremental backup (wherein only those files that have changed since the last backup are written to tape).
In an image backup, the data image is read sequentially from the disk and written to the tape. As the tape is being written, a detailed file map is created to facilitate a subsequent restore operation. This map is stored on the tape as part of the backup. This detailed map includes each file (or portion thereof) identification, size, and permissions as well as its location(s) on the tape. Creating this detailed map is typically very time consuming. For example, assuming a million files, it could take hours to create its corresponding file map.
Moreover, even though the file map can allow accessing and restoring a particular file after image backup, the tape itself can undesirably decrease file restoration. Specifically, the tape drive has heretofore been fundamentally a sequential backup device, wherein random access or adjusting backward/forward takes significant time. Therefore, in light of the time to create the file map and the time to then access a particular file, many users have historically chosen a file-by-file backup rather than an image backup.
However, technology improvements in tape drives have dramatically increased the speed that files can be accessed, even if tape adjusting is necessary. Therefore, a need arises for backup and restore operations that can take advantage of technology improvements in tape drive speed.
A method for backing up data in a computer system from a plurality of primary data sources to a secondary data source is provided. The method comprises copying data sections from the plurality of primary data sources to the secondary data source and providing a data pointer on the secondary data source. The data pointer indicates a starting point of each data section from the plurality of primary data sources and where that starting point is on the secondary data source. This data pointer information provides the minimum information necessary to map a location from the primary data source(s) to its location on the secondary data source. Creating this map is much less time consuming than creating the detailed map described above. For example, creating this data pointer information typically takes only a few seconds.
Advantageously, the data sections can be copied from the plurality of primary data sources in the order provided on the plurality of primary data sources, not by file order. In this manner, the method provides a quick and efficient backup of data from the plurality of primary data sources to the second data source.
In one embodiment, each transfer includes at least one data section and information regarding the at least one data section. The information can include, for example, the size of the data section. The data sections can be limited to used bits or a combination of used and unused bits. In the case where the data sections are limited to the used bits, the data pointer information is needed to determine the location on the secondary data source since the amount of data written to the secondary data source varies based on the amount of used bits. In one embodiment, the secondary data source includes a tape drive and the at least one primary data source includes a disk drive.
A method of restoring individual files and/or folders from a secondary data source to a plurality of primary data sources is also provided. The method only requires the minimal data pointer information and directly accesses the data from the secondary data source. The method of restoring individual files and/or folders comprises reading a list of files and/or folders to restore. This list contains the name and identification node number for each file and/or folder to restore. This list is generated during a backup operation and one or more of files and/or folders can be selected for restore. This method of restoring calculates, based on the identification node number, its location on the original primary data source(s) and, after reading the data pointer information from the secondary data source, calculates the location of the identification node of the corresponding section on the secondary data source.
Of importance, the location of the identification node merely indicates a starting point on the secondary data source of a data section associated with the backup transfer. In this manner, the detailed file maps of the prior art can be eliminated, thereby saving significant time during backup of the primary data sources. Once the identification node has been read from the secondary data source, the data section(s) that contain the data for this file and/or folder can be advantageously accessed using the at least one identification node and information in the backup transfer regarding the data section. At this point, the data section(s) can be read from the secondary data source and the file and/or folder data can be restored. In one embodiment, the information in the backup transfer includes a size of the data section as well as sizes of other data sections in the backup transfer.
A tape drive for backing up and restoring data sections for at least one data source is also provided. The tape drive comprises a data pointer, which provides the starting location on the secondary data source for each data section created during a backup from the at least one data source. If the tape drive includes a plurality of transfers of data sections, then each transfer includes information regarding the data sections therein. This information can include sizes of the data sections.