1. Field of the Invention
The present invention relates in general to the field of data processing systems, and more particularly, the present invention relates to managing data in a networked data processing system environment incorporating a single-instance-storage volume.
2. Description of the Related Art
An ever-increasing reliance on information and computing systems that produce, process, distribute, and maintain such information in its various forms, continues to put great demands on techniques for providing data storage and access to that storage. Business organizations can produce and retain large amounts of data. While data growth is not new, the pace of data growth has become more rapid, the location of data more dispersed, and linkages between data sets more complex.
Generally, a data deduplication system provides a mechanism for storing a piece of information only one time. Thus, in a backup scenario, if a piece of information is stored in multiple locations within an enterprise, that piece of information will only be stored one time in a deduplicated backup storage volume. Similarly, if the piece of information does not change during a subsequent backup, that piece of information will not be duplicated in storage as long as that piece of information continues to be stored in the deduplicated backup storage volume. Data deduplication can also be employed outside of the backup context thereby reducing the amount of active storage occupied by duplicate files.
The storage area of a data deduplication system is called a single-instance-storage volume. When used in a backup context, single-instance-storage volumes may store multiple backups that are made at different times. Because of the general nature of a data deduplication system of storing only one instance of data at a time, the multiple backups necessarily include both data altered since an immediately previous backup and pointers to data that has not been altered since the most recent backup. Upon receipt of a restore request, a backup module may access many physical areas of the single-instance-storage volume as dictated by the pointers to retrieve the necessary data to fulfill the restore request.
Those with skill in the art will appreciate that backups are typically saved for a limited time and are then deleted from the single-instance-storage volume to free space for future backups. The deletion of backups creates “gaps” in the single-instance-storage data store, which increases the time and resources to access data in order to fulfill restore requests. Thus, there is a need for handling the access of backup data in a single-instance-storage volume as multiple backups and deletions of old backups create an increasingly fragmented volume.