1. Field of the Invention
This invention relates generally to the field of computer systems, and more particularly, to a system and method for replicating a set of files that preserves the underlying extent structure and achieves various efficiencies based on the extent types of the extents.
2. Description of the Related Art
Many computer systems execute file system software that operates to store files on a storage device. From the point of view of most software applications, the data for a file has a logical structure as a continuous stream of bytes, e.g., starting at offset 0 and continuing to offset N−1, where N is the size of the file in bytes. However, the data may or may not all be stored on the underlying storage device as a continuous set of bytes.
In an extent-based file system, the data of each file is organized into one or more extents. An extent is information that represents a particular byte range of a file. An extent may correspond to a contiguous set of storage device locations, e.g., a contiguous set of disk blocks in the case where the storage device is a disk drive. Suppose for example that a file has two extents. One portion of the file's data that corresponds to one of the extents may be stored in one contiguous set of disk blocks, and another portion of the file's data that corresponds to the other extent may be stored in another contiguous set of disk blocks. The two sets of disk blocks may be located anywhere on the disk drive.
The underlying extent structure of a file is usually transparent to most software applications. A typical software application may read the data from a file through an interface provided by the file system software which presents a view of the file data as a continuous stream of bytes, as described above. For example, the application may open the file and request to read the data at particular logical offsets within the file. The file system generally takes care of determining how the logical offsets map to the underlying extents and determining where the data of each extent is stored on the underlying storage device.
In some computer networks, files need to be replicated periodically from a source computer system to a target computer system. A software application may execute on the source computer system to read the file data and transmit it to the target computer system. The target computer system may receive the file data and create copies of the files. The software application that reads the file data on the source computer system may not be aware of the underlying extent structure of the files. For example, the software application may read the file data through the normal file system interface which provides the file data but does not provide any information indicating how the data is organized into extents. The software application may transmit the file data to the target computer system without transmitting any information specifically indicating how the data is organized into extents. Thus, while the files may be copied to the target computer system, the underlying extent structure of the files may not be preserved on the target computer system. For example, if a particular file on the source computer system has four different extents, the copy of the file created on the target computer system may have a different number of extents, e.g., may have only one extent, or may have six extents. Or even if the copy that is created does happen to have four extents, the extents in the copy may represent different byte ranges than the byte ranges represented by the extents in the original file.