1. Field of the Invention
This invention relates generally to disk drive backup systems and, more particularly, to image backup methods for backing up selected data from disk partitions of storage devices.
2. Description of the Related Art
Modern computer systems typically include one or more mass storage devices such as hard disk drives, optical disc drives, floppy disk drives, removable disk drives, and the like to store a large amount of information. Often, however, the storage devices fail to operate properly for various electromechanical defects. In the event of such failures, valuable data stored on the storage devices may be lost permanently or may require costly and time consuming repairs to recover the original data. To guard against such failures, modern computer systems typically employ a backup system to backup data stored on a storage device. When the storage device fails or original data becomes corrupted, the backup system uses the backed up data to restore the original data.
Generally, storage devices contain one or more disks for storing data. For example, hard disk drives typically include one or more disks arranged to store data. The hard disk drives include a plurality of sectors and may be partitioned into one or more partitions (e.g., volumes, logical drives, etc.) as is well known in the art. In addition, each of the disk partitions is a logically self-contained volume and is typically represented by a drive letter such as xe2x80x9cC,xe2x80x9d xe2x80x9cD,xe2x80x9d xe2x80x9cE,xe2x80x9d or the like. Each partition contains files and directory bit maps such as file allocation table or the like. Typically, a partition is organized as a linear sequence of clusters, each of which is comprised of a number of sectors.
FIG. 1A illustrates a schematic diagram of an exemplary disk 100 for storing data. The disk 100 is configured to include a plurality of tracks 102. Each of the tracks 102 is divided into sectors 104 for storing data. The disk 100 may be partitioned into one or more partitions with each partition having a file allocation data structure such as a file allocation table.
FIG. 1B shows a schematic diagram of an exemplary track 102 divided into sectors 104. In this arrangement, files are configured to be stored in the disk 100 in units of clusters 106. Each of the clusters 106 includes a pair of sectors 104. As is well known in the art, however, a cluster may include any number of number of contiguous sectors typically in powers of two (e.g., 1, 2, 4, 8, 16, etc.).
In general, a cluster is the smallest allocation unit for a file as implemented in many operating systems (e.g., MS-DOS(trademark), MS WINDOWS(trademark), etc.). A file uses one or more clusters to store its data. For example, if a file is comprised of a single byte of data, a single cluster is allocated to store the file data. On the other hand, if the size of the file exceeds that of a single cluster, a plurality of clusters may be required to store the large file data.
FIG. 1C shows a schematic diagram of a plurality of clusters 106 used to store files F1, F2, F3, and F4. The clusters 106 are arranged sequentially in the order of clusters C1, C2, C3, C4, C5, C6, C7, and C8. The files F1 and F3 are smaller than a cluster size and are stored in clusters C1 and C5, respectively. On the other hand, the files F2 and F4 are larger than a cluster size and thus require more than a cluster to store them. For example, the file F2 is stored in the clusters C2 and C3 while the file F4 is stored in the clusters C6 and C7. The files F1, F2, F3, and F4 do not completely fill the clusters C1, C3, C5, and C7, respectively, so as to leave spaces 108 that are not used. However, in the absence of compression, these spaces 108 are typically not used to store data from other files. Since the clusters C1, C2, C3, C5, C6, and C7 are used to store valid file data, they are referred to herein as used-clusters, valid data space, or the like.
In contrast, the clusters C4 and C8 are not used to store valid file data and its spaces 110 are available for storing data. The spaces 110 corresponding to the unused clusters C4 and C8 are also referred to herein as xe2x80x9cfree spaces,xe2x80x9d xe2x80x9cavailable spaces,xe2x80x9d xe2x80x9choles,xe2x80x9d or the like. The spaces 110 may be classified into two categories. On the one hand, the spaces 110 may contain zeros because the clusters have not been used to store data. On the other hand, the spaces 110 may include data from a file that has been previously deleted. In this case, the spaces may include non-zero or garbage data. In either case, the spaces represent unused spaces or clusters where data may be written to.
Operating systems in modem computer systems typically keep track of the free spaces by means of free list, bit map, or the like. For example, conventional bit maps provide a bit for each allocation unit (e.g., cluster) indicating whether the associated allocation unit is available or not for writing data. In this manner, when new data are to be written to a disk, the operating system checks the bit map and determines one or more available allocation units to write the data. The free list or bit map is generally provided in a file allocation data structure (e.g., file allocation table or FAT) for each partition and is well known in the art.
Back up techniques typically fall into two broad categories: file-based backup and image-based backup. In file-based backup systems, the contents of individual files are copied from a source disk to a backup media. The files are usually copied without regard for how they are arranged on the source disk. For example, a partition may have ten sectors containing two files. One file is stored in sectors two through four and sectors eight and nine while the other file is stored in sectors five through seven. The remaining sectors zero and one are unused. In this case, the file-based backup would store information in the backup in the following sequence: sectors two through four, eight and nine, five through seven, such that the unused sectors zero and one are not copied.
However, since a partition or drive often includes hundreds or even thousands of files, backing up all files of the entire partition or drive may require a substantial number of non-sequential read and write operations. For example, to back up the former file in sectors two through four and sectors eight and nine, a backup system reads sectors two through four first, and then performs a seek to sector eight for reading sectors eight and nine. Such non-sequential read and write operation entails numerous seek operations to proper sectors of clusters. Accordingly, the conventional file backup method may require a substantial amount of time to backup the entire partition or drive.
By comparison, the image-based backup method generally reduces the time required to backup an entire partition. Image-based backup systems operate on a partition or drive basis and are capable of backing up a disk or one or more partitions of the disk. In this method, all data on the partition, including valid data, free space, and invalid data, are copied and stored on a backup medium. For example, to perform an image backup of a partition xe2x80x9cC,xe2x80x9d the image-based backup method operates to read and store the data on the partition sequentially from beginning sector to the end. By thus reading and storing the sectors linearly, seek operations are minimized. Hence, the backup time is typically reduced in comparison with the file-based backup technique.
Unfortunately, the conventional image-based backup methods have several drawbacks. For example, since conventional image backup systems typically store all data including valid data, free space, and invalid data, a substantial portion of the backup medium may be used to store unnecessary data such as the invalid data and the free space data, which typically consists of zeros. To the extent that a partition has a relatively high percentage of free space, the conventional backup systems may exhibit a correspondingly low efficiency in backing up data.
In addition, a partition typically accumulates more invalid data the longer it has been in use. The increase in the accumulation of invalid data results from, over time, more data being written to the partition and then overwritten or erased by other data. Thus, the conventional image backup methods may store a substantial amount of invalid data as well as free space data, thereby unnecessarily adding to the backup time.
In view of the foregoing, what is needed is an image backup method for backing up the data of one or more partitions or drives while minimizing the backup of free space data and/or invalid data.
The present invention fills these needs by providing a method for backing up data from a disk partition using a block map. It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, a device, a method, or a computer readable medium. Several inventive embodiments of the present invention are described below.
In accordance with one embodiment, the present invention provides a method for backing up data from a storage device having one or more partitions. Each partition has a plurality of clusters that include one or more sectors capable of storing data to be backed up. A number of sectors are specified for a block such that a partition to be backed up is defined in terms of a plurality of the blocks. A block map is then generated to indicate whether each of the blocks in the partition contains any data to be backed up. The block map is then traversed and the blocks that are indicated in the block map to contain the data are backed up from the partition while the blocks having no data are not backed up. Preferably, the block map is generated by determining, for each block, whether any cluster in the block includes any data to be backed up by traversing a file allocation table of the partition.
In another embodiment, a method is provided for backing up data from a storage device having one or more partitions. Each partition has a file allocation table and a plurality of clusters. The file allocation table includes a cluster entry for each of the clusters in the partition to indicate whether each cluster has any valid data for backup. Each of the clusters has one or more sectors capable of storing data to be backed up. A block is defined to include N contiguous clusters such that a partition to be backed up is specified in terms of M blocks. The N clusters in the blocks provides a specified backup granularity. The file allocation table is traversed to determine if any cluster entries in each of the blocks indicate that the associated block contains data to be backed up. In particular, a block is indicated to contain the data to be backed up when any cluster entry in the block is indicated as having any data to be backed up. A block map having M block entries corresponding to the M blocks is generated. The block map includes one block entry per block. Each of the M block entries indicates whether the associated block in the partition contains the data to be backed up. In response to the block map, only the blocks indicated to be containing the data to be backed up are backed up.