1. Field
The disclosure relates to a method, system, and article of manufacture for the elimination of redundant objects in storage systems.
2. Background
A storage management application, such as IBM* Tivoli* Storage Manager* (TSM), may be implemented on a storage management server. The storage management application may manage storage requirements for a plurality of client nodes that are coupled to the storage management server via a network. * IBM, Tivoli, and Tivoli Storage Manager are trademarks or registered trademarks of IBM Corporation.
The storage management application may create and manage a repository for data and programs that are backed up, archived, migrated, or otherwise copied from the client nodes to the storage management server. The storage management server may store data objects, such as files, in one or more storage pools and may use a database stored in the storage management server for tracking information about the stored data objects.
The storage management application may perform incremental backup, incremental archiving, migration, or incremental copying of data from the client nodes to the storage management server. For example, if the storage management application comprises a backup application then the backup application may perform incremental backup operations in which files are backed up only if the files have changed since a previous, periodic full backup, where the periodic full backups may be made on a weekly, monthly or some other periodic basis. TSM extends incremental backup by using a “progressive incremental backup” in which objects are backed up once and then never backed up again unless the objects undergo modifications on a client node. The progressive incremental approach for backups, archiving, or copying of data, etc., may reduce the amount of data that has to be copied or moved to the storage management server from the client nodes, and can reduce network traffic and storage space requirements over the incremental approach for backups, archiving, or copying of data. The progressive incremental backup approach may use a database that tracks information about every stored object and the location at which each object is stored.
In certain computing environments, different client nodes may store the same files in the storage management server. For example, client nodes may have the same operating system files or different people working on the same project may store the same document locally on different client nodes. The storage of the same data object in different client nodes may introduce redundancy for backups, archiving, migration, copying, etc., by backing up and storing the same files from different client nodes on the storage management server, and may lead to inefficiencies even in systems using the progressive incremental approach or certain other approaches.