The present invention relates generally to the field of file systems, and more specifically to file systems with clone files.
The significant growth of the amount of data to be stored in quickly accessible form is a challenge to system designers that are concerned with storage efficiency and data integrity protection. For example, analytics, stream computing (the analysis of high bandwidth flows of data from real-time sources), and data warehousing require fast access to large quantities of data. In large storage systems, data must be available for immediate and recurring processing, be stored efficiently, be protected against hardware and software failures, and be scalable in terms of its size and the amount of compute power that may be applied to it. Efficient storage techniques often enable storage systems to be faster which can provide users with quick access.
To utilize storage efficiently, a technique called file cloning has been developed to reduce the storage consumed by multiple versions (clones) of a file that is opened and edited by multiple users. File cloning enables storage space to be conserved by storing an original file in a read-only mode and allocating additional storage space for only the data written into the file by users. The additional space is only large enough to contain the new data that has been written into the clone file by a user. The additional space is read and write enabled, read from so that newly written data, which is not in the original file, may be accessed. Data that has not been changed is read from the original file. Such an arrangement is called a clone file. For example, this enables two copies of a file (two clones of the file) that are altered by two users, to consume only a fraction of the space that two complete copies of the original file, one for each user, would have otherwise consumed. File cloning is reduces the overall space needs for copies of files where the file copies (called child or clone files) have a significant amount of unchanged data compared to the original files (parent files).
For example, clone files can be used to provision virtual machines by creating a virtual disk for each machine by cloning a common base image file, often referred to as a “gold image”. A related usage is to clone the virtual disk image of an individual machine as part of taking a snapshot of the machine state. Cloning a file is similar to creating a copy of a file, but the creation process is faster and more space efficient because no additional disk space is consumed until the clone or the original file is modified. Multiple clones of the same file can be created with no additional storage space allocated and clones of clones can be created. While the cloning of files decreases storage space for active files on a server, current techniques to backup clone files consume the storage space of a non-cloned file for each clone file that is backed up, and upon restore, restores original clone files as a non-cloned files.
Unix-style file systems record information in data structures called inodes that is used to locate and manage files. In a Unix-style file system, an index node, informally referred to as an inode, is a data structure used to represent a file system object, which can be one of several things including a file or a directory. Inodes store the attributes and disk block location(s) of the file system object's data. File system object attributes may include manipulation metadata (e.g., change, access, and modify times), as well as owner and permission data (e.g., group-id, user-id, and permissions). Inodes are also instrumental in the creation and management of clone files on a Unix-style file system that supports clone files. Metadata in an inode of a clone file specifies its parent file.