As the number of personal computers, workstations and servers continues to grow, the amount of important business and technical data stored on these computers also increases. The need for increased data storage space within an enterprise grows continuously. One reason for the growth is that users want to keep many files at their fingertips, even those that are rarely used, but are nevertheless valuable.
Archival systems provide workstation users the ability to optionally delete seldom used files on their workstations after a copy has been archived. Users have the option of asking the system to archive a complete collection of files. In that way, users can free up valuable space on their workstations, but still easily and quickly retrieve archived files when needed. This helps users make more efficient and effective use of workstation storage.
Stand-alone back-up tools, that typically store copies of files that have been modified recently using disks or tapes, are often used for back-up storage to protect the data. An enterprise that automatically backs up all new and changed workstation files each evening, for example, has the added benefit of being able to recover a file that is accidentally lost.
Within a single enterprise, there can be a variety of types of workstations and computer systems that are used and need to be backed up to safeguard against loss. For example, engineers may be using UNIX on their workstations, the accounting group may be using DOS, and there may be LAN servers running Novell Netware. Workstations can be attached through a LAN where commonly used applications and critical data are stored on the file servers.
Storage management systems striving to reduce storage costs place back-up and archive data in storage hierarchies consisting of different kinds of storage devices. One such storage management system is the IBM Adstar Distributed Storage Manager (ADSM) product. In ADSM, all available storage is divided into storage pools where each storage pool is assigned a unique name. A storage pool can be mapped to any storage device. However, all devices in a storage pool must be the same type of device. Administrators can set up as many storage pools as their businesses require. Storage pools can be managed dynamically while the system is operational. A plurality of storage pools can be arranged into a storage hierarchy.
Typically, an installation defines a hierarchy based on the speed and availability of the storage devices. The highest level of the hierarchy has the faster direct access storage devices (DASDs), while the lowest level has the more economical removable and sequential access storage devices such as individual tapes, tape jukeboxes or optical jukeboxes. In a storage hierarchy based on the speed of the storage devices, both cost of storage and access speed increase towards the higher levels of the hierarchy.
When the amount of data in a storage pool reaches a high occupancy threshold specified by the administrator, data objects can be automatically moved to the next available storage pool level in the hierarchy. This movement of data is called migration. Migration continues until the storage pool from which the data is migrating reaches the low occupancy threshold specified by the administrator.
An object placement management system can be deployed as a server accessed by heterogeneous clients over a network of computers. The client can request that objects under its control be moved to the storage hierarchy. Additionally, the server can poll a client and then initiate the backup of objects under the control of the client. The storage server can be a multi-level storage hierarchy where objects moved to it are stored and archived. An example of a storage hierarchy in a file backup system is a set of DASDs to stage the files being backed up and a set of tape libraries to archive the files.
The server uses its higher speed storage devices to stage sets of objects being moved to it. In particular, when the server has magnetic disks in addition to optical devices or tape devices used for archival storage, the disks are used in combination with main memory to stage the objects being moved. In the presence of a disk at the server, an object is considered moved by the client when it has been copied from the client storage to the local disk of the server. In the absence of a local disk at the server, the object needs to be written out to the archival device of the server to be considered moved by the client.
Objects can have any arbitrary size, therefore, it may not always be possible to move a collection of objects into only one media instance. Collections and even individual objects may spill over several media instances. It is desirable in such situations that the number of media instances of a storage hierarchy in which collections of objects are moved, be minimized.
Objects may belong to named collections such as a file system or a directory containing individual files that are given a name by the system. A named collection also refers to files belonging to the same owner (from the same user ID). It is desirable that when named collections are moved, information about the objects in the named collection be maintained in order to keep the collections together. This preserves the logical clustering defined by a user through a named collection.
Standard back-up utilities in UNIX, back up large collections of network clients using time as the criterion for invoking the back-up service. However, these systems usually back up files in a time clustered way, placing together in tapes all the files backed up in a given session.
An example of a UNIX based file server that uses a storage hierarchy is 3DFS. "3DFS: A Time Oriented File Server", W. D. Roome, Proceedings of the Summer USENIX Conference, 1991, implements an extended file server system where versions of files are maintained indefinitely. 3DFS has the ability to display contents of directories as they existed on a given date. 3DFS can also retrieve all the versions that have ever existed of a file in a directory. However, 3DFS does not allow data to be stored according to users or to source devices. Thus, the retrieval of all files belonging to disk drive X on date Y is difficult to support efficiently, as many storage devices may need to be visited to retrieve the required files.
Enterprises need to be able to use co-location criteria to speed retrieval of sets of objects (such as data files) placed in a storage hierarchy. An example of a set of co-location properties for files includes the owner of the files, the node or device where the owner resides, the device where the files are stored, and the date the files were created or last used. In particular, it is desirable for file back-up services to make use of co-location criteria associated with files. Bulk retrievals are necessary, for example, when a disk of a client fails which needs to be recovered. Large installations can encounter client disk failures at a rate of one per day.
There is a need to preserve the spacial and temporal localities that exist in sets of objects being moved within a storage hierarchy. Sets of objects may be moved in their entirety or in increments according to pre-specified criteria. It can also be desirable to be able to keep together arbitrary clusters of objects in any storage hierarchy.
More particularly, there is a need to optimize the performance of the critical but rather infrequent operation of retrieving sets of objects from lower levels of the storage hierarchy without incurring a high performance penalty during the more common operation of moving sets of objects down the storage hierarchy. In a storage hierarchy, a cost of storing an object is higher at higher levels of the storage hierarchy, but time to access an object in a higher level of storage hierarchy is lower. There is a central tradeoff in all storage hierarchies between the monetary cost of storing an object at a given level of a storage hierarchy versus the performance cost of retrieving the object from that level.
There are a number of software systems that would benefit from administering the movement or migration of objects in a hierarchy of storage devices. Examples of such systems are network based client server file back-up systems; systems that store digital images, in particular, medical images; systems that store faxes in digital form; systems that store video in digital form; and systems that store digital multimedia data.
One or more of the foregoing problems are overcome, and one or more of the foregoing objects are achieved by the present invention.