File system replication enables recovery of data in situations where data has been destroyed, inadvertently or otherwise. Conventional replicating systems support four replication types which are designed to deal with network interruptions that are common in the wide area network and recover gracefully with very high data integrity and resilience, ensuring that the replicated data is in a stable state. The first type of replication, directory replication, transfers modified deduplicated data of any file or subdirectory within a source system directory that has been configured as a replication source to a directory on a target system that has been configured as a replication target. Directory replication offers flexible replication topologies including system mirroring, bi-directional, many-to-one, one-to-many, and cascaded, resulting in efficient cross-site deduplication. The second type of replication, managed file replication, directly transfers a backup image from a source system to a target system, one at a time upon request from an administrator. This type of replication provides the same cross-site deduplication effects and flexible network deployment topologies as directory replication. The third type of replication, MTree replication, is designed to replicate MTrees between storage systems. MTrees are user-defined logical partitions of the storage systems that enable granular management of the file system. MTree replication creates periodic snapshots at a source system and sends the differences between two consecutive snapshots to a target storage system. MTree replication supports all the topologies supported by directory replication. The fourth type of replication, collection replication, performs whole-system mirroring in a one-to-one topology, continuously transferring changes in the underlying collection (i.e., a set of deduplicated data segments stored on disk) to the target storage system.
Traditionally, backup systems are optimized by only replicating portions of files that have been modified. In such systems, data files are segmented and stored in segment trees. For example, each data file may be represented by a segment tree. Each segment tree includes one or more levels of segments, such that a segment at one level is further segmented into multiple segments which are stored at a lower level. When replication of a file is to be performed, the source storage system traverses the segment tree representing the file to determine which segment(s) have been modified by comparing the segments against segments of a segment tree at a target storage system that represents the same file.
In some storage systems, segmenting the file may occur using various segmentation algorithms. Certain algorithms, however, may not always be the most efficient depending on the application. For example, some algorithms may provide benefits in terms of providing a flexible segmentation structure. The algorithms, however, may incur additional processing during the replication process. Accordingly, there is a need to maintain the flexibility of certain segmentation structures while minimizing the additional processing the structures may incur during replication.