A storage controller is a physical processing device that is used to store and retrieve data on behalf of one or more hosts. A network storage controller can be configured (e.g., by hardware, software, firmware, or any combination thereof) to operate as a storage server that serves one or more clients on a network, to store and manage data in a set of mass storage devices, such as magnetic or optical storage-based disks, tapes, or flash memory.
Mass storage devices provide a series of addressable locations in which data can be stored. Some devices, such as tape drives, only permit storage locations to be accessed in sequential order, while other devices, such as hard disks or flash, permit random access. Mass storage devices may be combined to give the impression to higher layers of a single device with certain desirable characteristics. For example, a Redundant Array of Independent Disks (“RAID array”) may contain two or more hard disks with data spread among them to obtain increased transfer speed, improved fault tolerance or simply increased storage capacity. The placement of data (and calculation and storage of error detection and correction information) on various devices in a RAID array may be managed by hardware and/or software.
Many contemporary data processing systems consume and/or produce vast quantities of data. Mass storage devices such as hard disk drives are often used to store this data. To keep up with the amount of data consumed and produced by these processing systems, either the storage capacity of mass storage devices and/or the efficiency of the usage of space on the mass storage devices can be increased. One method for increasing the efficiency of the usage of space on a mass storage device is to perform a deduplication operation which eliminates redundant data stored on a mass storage device.
However, deduplication often introduces fragmentation into a data set that was previously stored as contiguous blocks on disk. Each addressable storage location can usually hold multiple data bytes; such a location is called a “block.” When the data blocks of a data set are separated and/or stored out of read order, the data set is said to be “fragmented.” A process that reads the fragmented data set might cause the storage system to perform multiple read operations to obtain the contents of the data blocks corresponding to the data set. The mechanical nature of many types of mass storage devices limits their speed to a fraction of the system's potential processing speed, particularly when a data set is fragmented and requires multiple read operations to retrieve the data set. Because fragmentation caused by deduplication can negatively impact storage system performance, many storage system users disable deduplication operations and therefore do not benefit from the space saving advantages of deduplication.
Therefore, a technique to balance the effects of fragmentation introduced during deduplication operations and the storage system performance desired by users is needed.