A file system is software that manages files containing data, and a set of files managed by a given file system is referred to as a file set. A file system provides a “layer” of software in a computer system to manage storage space for files. This layer is between the operating system (which communicates directly with devices) on the computer system hosting the file system and an application program that uses the data in the files managed by the file system.
Primary data is “live” or production data used for business-specific purposes, and a clone is a persistent copy of the primary data. A clone is typically created for saving a copy of primary data at a given point in time. After the clone is created, typically no further write operations are made to the clone so that it continues to represent the primary data as a “frozen image” at the time the clone was created. Although the primary data continues to change, a clone serves as backup data that can be used in the event of failure of the computer systems, networks, or communication links, or for recovery from any other corruption of primary data. For mission-critical and other applications that must remain highly available, a complete copy of the primary data and the clones representing different points in time is often maintained.
Primary and clone file sets co-exist on one device and are managed by the same file system. When a clone is created, no data blocks are copied into the clone file set. Operations continue to update, add, and/or delete primary data. Whenever a file is modified, the original data is copied into the clone file set. Such a write operation is referred to as a Copy on Write (COW) operation. Therefore, for a modified file, there are some shared data blocks (unmodified blocks in the primary file set) and original data blocks that are “pushed” to the clone file set. Two sets of metadata are also maintained; one for the primary file set and another for the clone file set. Examples of metadata maintained by the file system include access permissions, security data, and so on.
Because a clone file set shares some data blocks on the same device with the primary file set, backing up a clone file set uses processing resources on the host of the file system that could otherwise be used for maintaining the primary file set. Most backup software writes files to backup storage on a file-by-file basis, reading all of the data for each file from a storage device and then writing the data for that file to a backup storage device. Because a single file may have data in many non-contiguous locations on the storage device, the time to backup a file increases with the number of different non-contiguous locations that must be read to construct the file. A more efficient technique to write a file to backup storage is needed. Furthermore, because creating a backup copy can be resource-intensive, it is desirable to shift some of this processing load to another computer system.
What is needed is a way to quickly and efficiently allow a secondary host that is not the file system host to create a backup of a clone file set that shares data with an active file set.