Field of the Invention
This invention relates to performing deduplication of volume regions in a storage system.
Description of the Related Art
As computer memory storage and data bandwidth increase, so does the amount and complexity of data that businesses daily manage. Large-scale distributed storage systems, such as data centers, typically run many business operations. A datacenter, which may also be referred to as a server room, is a centralized repository, either physical or virtual, for the storage, management, and dissemination of data pertaining to one or more businesses. A distributed storage system may be coupled to client computers interconnected by one or more networks. If any portion of the distributed storage system has poor performance, company operations may be impaired. A distributed storage system therefore maintains high standards for data availability and high-performance functionality.
Software applications, such as a logical volume manager or a disk array manager, provide a means of allocating space on mass-storage arrays. In addition, this software allows a system administrator to create units of storage groups including logical volumes. Storage virtualization provides an abstraction of logical storage from physical storage in order to access logical storage without end-users identifying physical storage.
To support storage virtualization, a volume manager performs input/output (I/O) redirection by translating incoming I/O requests using logical addresses from end-users into new requests using addresses associated with physical locations in the storage devices. As some storage devices may include additional address translation mechanisms, such as address translation layers which may be used in solid state storage devices, the translation from a logical address to another address mentioned above may not represent the only or final address translation.
For many storage systems, large regions of separate logical volumes often include the same data, or are reused among multiple volumes. For example, a system may include large numbers of virtual machines generated from the same gold master in which the preloaded system software occupies the first gigabyte (GB) of the volume. This system software may be the same for multiple volumes, resulting in the duplication of large regions of data in multiple volumes. Efforts to reduce the amount of identical data stored in the storage system are needed to improve the efficiency and operational capacity of the storage system.
In view of the above, systems and methods for performing deduplication of volume regions are desired.