1. Field of the Invention
The present invention relates generally to data processing systems and, more specifically, to backup systems that employ deduplicated data stores.
2. Description of the Related Art
Many backup systems can support multiple storage volumes (and/or storage devices), which enable a system administrator or other user to easily add additional storage devices to increase storage capacity when necessary. Backup systems implement allocation schemes to effectively allocate data among multiple storage devices. An example of an allocation scheme is a scheme that prioritizes the selection of a storage volume to store backup data based on the available space remaining on a storage volume. If such a scheme is implemented, upon receipt of a request to initiate a backup procedure, the backup system assigns a storage volume with the most available space to store the data associated with the backup procedure.
One system used for storing backup copies generated from primary data is a data deduplication data store or system. A data deduplication system provides a mechanism for storing a piece of information (which can include a file or a block of data) only one time. For example, during a first backup operation, if a set of data to be backed up includes multiple copies of a particular file (or even a particular block of data), only one copy of the particular file (or block of data) will be stored in the data deduplication system. Similarly, if the set of data includes data that has not changed between the time of the first backup operation and a subsequent backup operation, the data that has not changed will not be duplicated in storage as long as a copy of that data continues to be stored in the data deduplication system. The data deduplication system stores data in a manner that effectively provides data compression. Storing only a single copy of data reduces the amount of space a set of data that originally contains multiple copies of the data occupies within the data deduplication system.
With the advent of disk-based storage implementations that utilize data deduplication compression technology, allocation schemes for storing data among multiple storage devices and/or storage volumes that are driven by a metric such as “available space per disk volume” are no longer adequate. For example, a backup server utilizing an “available space per disk volume” allocation scheme would assign the storage device or storage volume (device or volume 1) with the most available space to handle an incoming backup procedure from a particular client. If the particular client has performed a prior backup to the backup server, the data associated with the prior backup could have been stored on a different storage device or volume (device or volume 2). Thus, after the incoming backup procedure completes, much of the data stored on device or volume 1 and device or volume 2 would be duplicate data. Since the very nature of deduplication algorithms focus on storing data only once, the “available space per disk volume” allocation scheme would result in storing data copies on multiple storage devices (e.g., device or volume 1 and device or volume 2, according to the prior example), thus counteracting the benefits of a deduplication system implementation.
The problem of storing duplicate copies of data becomes even more apparent if the choice for the allocation scheme is not only between traditional storage devices and a single deduplication storage device, but rather between multiple deduplication storage devices that are potentially manufactured by different vendors. In such a configuration, backup procedures are likely to alternate between multiple deduplication storage devices in an erratic manner, only to store the same data in all of these deduplication storage devices, which works against the entire principle of only storing one instance of data or blocks of data. Thus, there is a need for an improved method, apparatus, and computer program product for managing a backup system that uses deduplication storage devices or volumes to store backups.