A file is a logical unit of data in a file system. As referred to herein, a snapshot of a file is a read-only copy of a file as it existed at a certain point in time. That is, at a given point in time, a snapshot may be created of a file (hereinafter referred to as a production file) in a file system (or of the entire file system) that is being read from and written to by one or more applications. The content of the snapshot reflects the content of the production file at the given point in time. If the production file is modified after the snapshot is created, the content of the snapshot remains the same, even though the content of the production file has changed. The snapshot may be used in any desired way. For example, if the production file becomes corrupted or is modified in an undesirable way, the snapshot may be used to restore the production file to its state at the time that the snapshot was created, although other uses are possible.
FIG. 1A is a block diagram of an example of a snapshot of a file. The snapshot 100 comprises metadata 110 and file data 120, which may include one or more data blocks. In a typical file system incorporating snapshots, metadata 110 is a copy of a conventional mode for a file in the file system. Such a conventional mode is shown in FIG. 1B. An mode is a data structure for a file that is used to store information about the file (i.e., metadata). Typically, a file system includes an mode for each file in the file system. FIG. 1B illustrates some of the information stored about a file in a conventional mode 110. Inode 110 includes a mode field 181, an access time field 182, a change time field 183, one or more data block pointers 184, and one or more indirect block pointers 185. The mode field 181 specifies the access permissions for the file (e.g., which users can read, write, and/or execute the file), the access time field 182 is used to store the last time the file was accessed, the change time field 183 is used to store the last time that the file was modified, and the data block pointer(s) 184 and indirect block pointer(s) 185 are used to specify the storage locations at which the file data 120 is stored on a storage medium, such as a physical disk or logical volume of storage. As used herein, data block pointers 184 refer to pointers to blocks on disk, in a logical volume, or other storage medium that store file data, and indirect block pointers 185 refer to pointers to blocks on disk, in a logical volume, or other storage medium that store either data block pointers or other indirect block pointers.
One way to create a snapshot of a file is to create a copy of the mode for the file to form the mode for the snapshot, create a copy of each data block and indirect block referenced by the file, and modify the data block pointers and indirect block pointers in the snapshot copy of the mode to point to the newly created copies of the data blocks and indirect blocks. In one example shown in FIG. 2, production mode 201 points to two data blocks (i.e., data block 203 and data block 205). To create a snapshot of the file corresponding to mode 201, a copy 207 of production mode 201 may be created, copies 209 and 211 of data blocks 203 and 205 may be created, and the pointers in mode 207 may be modified to point to data blocks 209 and 211 instead of data blocks 203 and 205 (as in the mode for the production file).
In another approach to creating a snapshot of a file (referred to as “write-anywhere”), a copy of the production file mode is created at the time of snapshot creation, but data blocks and/or indirect blocks are copied only if and when the corresponding blocks in the production file are modified. In an example shown in FIG. 3A, production mode 301 includes pointers to data block 303 and data block 305. When a snapshot is created, a copy of production mode 301 is stored as snapshot mode 307, but it points to the same data blocks 303 and 305. When one or more data blocks in the production file are modified, new data blocks are allocated to store the modified data and the corresponding pointer(s) in production mode 301 are updated to point to the new data block(s). For example, as shown in FIG. 3B, if a write occurs that would result in data block 303 being modified, a new data block 309 is allocated to store the modified data and the pointer to data block 303 in production mode 301 is updated to point to data block 309. In contrast, snapshot mode 307 is not updated and continues to point to the data blocks of the production file at the time of the snapshot creation (i.e., data blocks 303 and 305).