The sharing of file system data blocks conserves data storage for storing files in a file server. The sharing of file system data blocks among versions of a file typically occurs when the file server has a file system based snapshot copy facility that periodically creates snapshot copies of certain production files or production file systems. The sharing of file system data blocks within a file and among unrelated files typically occurs when the file server has a file system based data de-duplication facility that eliminates from the data storage any file system data blocks containing duplicative data content.
Snapshot copies are in widespread use for on-line data backup. If a production file becomes corrupted, then the production file is restored with its most recent snapshot copy that has not been corrupted.
A file system based snapshot copy facility is described in Bixby et al. U.S. Patent Application Publication 2005/0065986 published Mar. 24, 2005, incorporated herein by reference. When a snapshot copy is initially created, it includes only a copy of the inode of the production file. Therefore the snapshot copy initially shares all of the data blocks as well as any indirect blocks of the production file. When the production file is modified, new blocks are allocated and linked to the production file inode to save the new data, and the original data blocks are retained and linked to the inode of the snapshot copy. The result is that disk space is saved by only saving the difference between two consecutive versions. Block pointers are marked with a flag indicating whether or not the pointed-to block is owned by the parent inode. A non-owner marking is inherited by all of the block's descendants. The block ownership controls the copying of indirect blocks when writing to the production file, and also controls deallocation and passing of blocks when deleting a snapshot copy.
A file system based data de-duplication facility permits a shared file system data block to be linked to more than one inode or indirect block. For example, data de-duplication is applied to a file when the file is migrated into the file server or when new data is written to the file. The new data is written to newly allocated file system data blocks marked as blocks that have not been de-duplicated, and an attribute of the file is set to indicate that a de-duplication process is in progress. Then the data de-duplication process searches a single-instance data store of de-duplicated blocks for a copy of the data in each data block marked as not yet de-duplicated. If a copy is found, then, in the inode or indirect block of the file, a pointer to the block marked as not yet de-duplicated is replaced with a pointer to the copy in the single instance data store, and a reference counter for the data block in the single-instance data store is incremented. If a copy is not found, then the block of new data is marked as de-duplicated and added to the single instance data store. Once the data de-duplication process has been applied to all of the data blocks of the file, then the attribute of the file is set to indicate that the de-duplication process is finished. Whenever a file is deleted, the reference counter for each data block of the file is decremented. Whenever a reference counter is decremented to zero, the storage of the corresponding data block is de-allocated by putting the data block on a free block list so that the storage of the data block becomes available for allocation for receiving new data.