1. Field of the Invention
This invention generally relates to the field of data processing and more particularly to high-performance backup of computers and computer workstations, including without limitation distributed backup of such systems over a network.
2. Description of the Related Art
Prior art backup methods generally provide for an xe2x80x9cimagexe2x80x9d backup of an entire disk volume, or a xe2x80x9cfile-by-filexe2x80x9d backup. An image backup copies the entire disk volume without regard to directory structure, and can be performed relatively quickly, although it does require time and space to copy the entire disk. However, since an image backup generally does not take account of directory and file information, such a backup does not support selective restoration of files. In order to be able to restore files selectively, generally a file-by-file backup has been required.
Conventionally, the files to be backed up in a file-by-file backup are accessed in accordance in the normal manner provided by the operating system, in which data is read from the disk in the logical order of file contents. The actual physical blocks of data on the disk corresponding to each file are not, however, generally stored in a contiguous or linear order. In practice, there is considerable physical discontinuity of recorded data blocks, both within individual files, and from file to file in a disk file system. Indeed, even if linearly recorded at the outset, the data blocks of files in a production computer system may become highly fragmented as blocks are read, revised and written over the course of normal usage. In normal operation, the operating system takes care of this, maintaining a directory which keeps track of the correspondence between the blocks of data that comprise a file, and the physical location of each block on the storage media. Yet the physical order of blocks is generally allowed to become discontinuous and fragmented.
The result of this disorder and fragmentation of raw disk data is that the process of reading files using normal operating system calls (or any other disk access methods that operate similarly) generally results in significant disk read head repositioning during the read operation. Since this mechanical movement can be the slowest operation on the computer, sometimes by orders of magnitude, reading a disk in this manner can be highly inefficient. A file-by-file backup that is constrained to read the disk in this manner will thus necessarily suffer from this significant inefficiency. Considerable improvement in backup operations can be obtained if this inefficiency can be overcome.
Accordingly, it is an object of the present invention to provide an improved backup method which avoids the inefficiencies of normal file-by-file disk access. Among the objects of the present invention, therefore, are the following:
To reduce disk read time by reducing the amount of head repositionings necessary to read the backup input;
To achieve such reduction by performing sequential rather than random reads of the input file, to the extent feasible; and
Despite having read the file in a physical sequential order, being able to keep track of the logical sequence of blocks as well, so that the file may be properly restored.
The foregoing and other objects of the invention are accomplished by reading the working directory maintained by the operating system to determine all of the blocks associated with the set of files or other data aggregations to be backed up. The data block identities so determined are sorted in accordance with their physical location on the disk, thereby providing a sequential order for reading. The data to be backed up from the random access storage device or devices is read in this sequential order, and written to the backup media. There is also stored in conjunction with the backup media a Catalog containing the names of the files in the backup set, the location of the file data blocks on the backup media, the proper ordering of the blocks in this original file, and any other desired file attribute information. The information in the Catalog makes it possible to restore in an efficient manner either individual files or entire file systems.
The manner in which the invention achieves these objects is more particularly shown by the drawings enumerated below, and by the detailed description that follows.