The present disclosure generally relates to the storage and management of data blocks in a file system where duplicate blocks of data are not physically written to the storage media. Instead, the metadata structures of the file system maintain the appropriate information to be able to store, retrieve, update and delete data blocks as the users of the file system choose, guaranteeing a logical view of the file system as if duplicate blocks are also physically stored.
The operation of computer systems, and storage of data on non-volatile media such as hard disks and solid state memory are well-known in the art. Stored data is generally organized as files, with a hierarchical structure such as folders, which contain sub-folders, which in turn may contain sub-folders, and so on, and files. Folders are also equivalently referred to as directories in some operating systems.
The data in a file system can reside on a single storage medium, or multiple such media. It can also be localized to a single compute system, or spread across multiple storage media across multiple computer systems, or stored as a Network Attached Storage (NAS) or a Storage Area Network (SAN) that may be shared by a multitude of computers, both local and remote.
A file system maintains metadata about the directories, files, and the blocks within files. The general approach well-known in the art is that of iNodes. Each entity the file system manages, such as directories and files, has an iNode which acts as the root of a hierarchical metadata structure describing that entity. For example, the iNode of a file contains information about that file such as the owner of the file, the access permissions for that file, size of the file, as well as pointers to each of the blocks where the data for that file physically resides. Large files may require multiple levels of indirection to store the pointers to all blocks of the file.
In standard art, each block of a file is physically stored in a distinct location on the storage media. This is true even if several of these blocks are identical to each other. The process of deduplication attempts to identify if a new block being written to the storage has identical content with that of a block already stored and if so refrains from storing the new block. Instead, the pointer for the new block points to the old block to which it is identical. Thus, considerable volume of storage can be saved when large numbers of data blocks are identical to existing blocks.
An example scenario where deduplication is quite effective is in the context of Virtual Machine Disks (VMDK). Here, each virtual machine running on a computer has its own copies of operating systems, applications, data and other data structures. These are together stored in its VMDK. If 10 such virtual machines are running on a computer and they all use the same operating systems, then each of the 10 VMDK's has an identical copy of the operating system. It is clear that deduplication can save a substantial amount of storage in this scenario since exactly one copy of the operating system is sufficient to serve all 10 virtual machines, if properly managed and maintained.
There is a well-known method in the prior art, used to determine if two blocks have identical content, without having to make a bit by bit comparison between the two blocks. A cryptographic-strength hash function is applied to each of the blocks and the blocks are considered identical if their hashes are identical. Thus, when a new block of data arrives at the file system, it first computes the hash of that block. To be able to determine if this hash equals the hash of an existing block, the file system maintains a hash search tree with an entry for each of the existing blocks of interest. The file system then searches in the hash tree for the hash value of the new block. If that value is found in the search tree, then the new block is considered to have identical content to the existing block with the same hash value. If the new hash value is not found in the search tree, then the new hash is generally inserted into the search tree, together with a pointer to the location where the new block is written.
A side effect of deduplication is that when a block is removed, the corresponding physical block may not be removed immediately. This is because the same physical block may be pointed to by multiple logical blocks as a consequence of deduplication. A physical block can only be removed if all logical blocks pointing to that physical block are removed. This implies that a “reference count” be maintained for each physical block, which is incremented each time an incoming block matches that block, and is decremented each time a logical block pointing to that physical block is removed.
To process data blocks at very high rates, for example a million or more input/output operations per second, the hash search tree and other data structures must be searched very fast and efficiently. In addition, since deduplication is associated primarily with a write operation, the impact of deduplication on read operations should be negligible. Finally, the removal of a logical block should not cost significantly more than the write operation. The prior art suffers from a number of disadvantages in performing these operations in the context of deduplication. The present invention addresses this need for methods to perform efficient deduplication of file blocks while at the same time implementing the other operations at high efficiency.