1. Field of the Invention
The present invention relates generally to data processing systems, and more particularly to storage management servers for optimizing selection and accessing of stored files to avoid mount and position thrashing.
2. Description of Related Art
Data processing systems typically require a large amount of data storage. Customer data, or data generated by users within the data processing system, occupy a great portion of this data storage. Effective data processing systems also provide backup copies of this user data to prevent a loss of such data. Many businesses view any loss of data in their data processing systems as catastrophic, severely impacting the success of the business.
A storage management server provides an effective means for protecting customer data. Generally, a client-server configuration includes several clients connected to a single server. The end users create client files and transfer these files to the server. The server receives the client files and stores them on attached storage devices. When used as a storage management system, the server manages the backup, archival, and migration of these client files. By storing the client file on an attached storage device, the server creates a first, or primary, copy of the client file. The server may, in turn, create additional backup copies of the client file for inclusion in the overall storage hierarchy to improve the data availability and data recovery functions of the storage management system. Clients may vary from small personal computer systems to large data processing systems having a host processor connected to several data storage devices. The server can also range from a small personal computer to a large host processor.
An advanced storage management server, such as Tivoli Storage Manager (formerly known as ADSM), maintains reference information about the client files copied within the attached storage volumes. The server uses a database to keep inventory information about the original client files and storage volume location information about the copies of the client files stored within the server. The inventory information typically includes a client system identifier, a client system directory, a client file name, and other attributes of the file. The location information typically consists of a storage volume identifier and a position within the storage volume among other storage attributes. In addition, the server database allows the server to assign a unique identifier to each client file stored within the attached storage volumes. Thus, the server can track individual files throughout the server storage component.
Accordingly, the server database introduces several advantages to the storage management server. The server can track multiple copies of an individual client file written to different storage volumes. By tracking secondary copies of the client file, the server improves the data availability to the client systems. For example, if a primary copy of a particular client file is inaccessible because it is stored on a destroyed volume or damaged media, the server can access an additional copy residing on a different storage volume and transfer the additional copy to the requesting client system. Further, the server can subsequently recover the unavailable primary copy of the client file from the secondary copy. The server needs both inventory and storage volume location information provided by the server database to accomplish the above-described data recovery.
A data processing system using a storage management server, including a file storage manager, stores files that have been backed up or archived from various client nodes. The server stores client data files in a storage hierarchy consisting of various media types (e.g., magnetic disk, tape, optical disk) and uses a database for tracking the attributes and storage location of each stored client file.
Another function of a storage management server is to select files that satisfy certain criteria, and transfer the files to another location. There are many situations in which the transfer of data to another location is necessary or desirable. For example, it may be desired to create a backup set that represents the latest set of files stored on the server for a particular client node. The backup set could be used for restoring files directly to a client node, without requiring use of a network, or for transporting these files to another server. Those skilled in the art will recognize that the creation of a backup set is only one example and that other applications are well-suited to the copy or transfer of data from one location to another. In general, the specification will refer to copied files as belonging to a copy set.
In this copying operation, data on the source server may be stored on various types of media or volumes. For example, storage media can be removable or non-removable, and can be accessed either sequentially or randomly. Typically, a storage management server can process files from different volumes types. For example, it can process data from random-access, non-removable volumes which do not have to be mounted each time they are accessed and are randomly searched; sequential volumes, such as tapes, which are mounted at the beginning of the volume and are sequentially processed; and random-access, removable volumes, such as optical disks, which are mounted for each search but randomly processed once mounted.
The description will continue in an illustrative sense with respect to storage volumes, which comprise random-access media and sequential-access media. Random-access media is considered to include media that is both non-removable and random-access. Sequential-access media is considered to include all removable media, whether it is accessed randomly or sequentially.
Information on random-access media, such as magnetic disk, can usually be transferred relatively efficiently. However, transferring data from sequential-access media can impose delays while the required volume is mounted. Moreover, additional delays may be required to position the media to the correct location on the storage volume.
Accordingly, one of the major challenges in generating a copy set or performing any copying operation is to discover how to efficiently copy numerous files from sequential-access media, such as magnetic tapes. The efficient copying should be done with minimal mounting and positioning of input volumes. Therefore, optimized selection and accessing of stored files should avoid mount and position thrashing, which occurs with excessive moving back and forth between the mounted volumes or positions within a volume.
In addition to problems encountered by certain types of media as just described, a further challenge arises from an efficiency problem inherent to the functionality of the copying operation. The problem stems from the utilization of two completely different views of the data, namely the inventory view and the storage view, in the copying operation.
Files are normally selected from inclusion in the copy set based on inventory view attributes of the data, important to the end user. Such attributes include the client node, filespace and file name information, and recency of the copy. As used in this specification, the term “filespace” refers to a logical space in the client's storage that can contain a group of files. For example, a filespace could be a logical partition or a directory and its subdirectories.
On the other hand, efficiency requires that files be transferred in some optimal order that depends on the location of these files within the server's storage hierarchy. This information is part of storage view attributes of the data and relates directly to the various types of storage media previously described.
Conventional solutions to the efficiency problem in utilizing both views typically include creating a list of files based on filter criteria which are evaluated using the inventory view. Files in the list are then sorted by their storage location and are transferred in sorted order, which represents the storage view. However, this approach requires a great deal of initial overhead to create and sort the list of files, which delays the transferring of files.
Therefore, there is a need in the art to provide a means whereby a copy set can be generated in optimal manner by considering both the inventory and the storage view of files, and without creation of a sorted list of files.