Commonly assigned U.S. patent application Ser. No. 09/283,223 K. F. Day III et al. is incorporated for its showing of a data storage library system having directors for storing and tracking multiple copies of data in system data storage libraries.
Commonly assigned U.S. patent application Ser. No. 09/391,186, T. W. Bish et al., is incorporated for its showing of a data storage library system for tracking and accessing data volumes from multiple data storage libraries having cache storage and backing storage.
This invention relates to storage of redundant copies of data volumes in a plurality of data storage libraries which have storage with different levels of access speed to the data volumes, such as cache storage and backing storage, and, more particularly, to accessing copies of data volumes from the data storage libraries.
Data processing systems comprising at least one host typically require a large amount of data storage. If the data, typically stored as a data volume, is not immediately required by the hosts, for example, if the data volume is infrequently accessed, the storage of the data volume may be on removable rewritable data storage media, such as magnetic tape or optical disk, and the data volumes may be written and or read by means of a data storage drive.
The data storage drive is typically coupled to the host, or processing unit, by means of a peripheral interface in which commands are directed only from the processing unit to the data storage drive, and the data storage drive responds to those commands, performing the commanded functions. No commands can be sent by the data storage drive to the coupled processing unit. Typically, the commands are performed by a device controller.
If a large amount of data is to be stored and accessed on occasion, data storage libraries are employed. Such data storage libraries provide efficient access to large quantities of data volumes stored in a backing storage of removable data storage media, the media stored in storage shelves which are accessed by robots under the control of robot controllers. Due to the large amount of stored data, typically, a plurality of hosts make use of the same data storage library, and a plurality of data storage drives are included in the library to allow access by the hosts. A library manager, which may comprise one or more processors or which may comprise the same processor as the robot controller, typically tracks each data volume and the data storage media on which it is stored, and tracks the storage shelf location of each data storage media.
Herein, a library manager, either with or without the robot controller, is defined as a xe2x80x9ccontrollerxe2x80x9d or a xe2x80x9clibrary controllerxe2x80x9d for the data storage library.
If the data storage media, subsequent to being accessed, may be reaccessed, it is advantageous to employ data storage libraries having both cache storage and backing storage. The data storage library will access the data volume of the removable media from the backing storage and will temporarily store the data volume in the cache storage so that it can be immediately reaccessed. The removable media may then be returned to a storage shelf, and the data volume updated while it is in cache storage without the need to reaccess the removable media. The cache storage is typically limited in capacity, requiring that the data volumes be migrated to backing storage so as to free space in the cache storage. Typically, a least recently used (LRU) algorithm is employed to migrate data volumes out of cache storage to backing storage.
It is also desirable to provide a level of redundancy of the data to provide constant access to data volumes, even should a data storage library or a communication path to a data storage library be unavailable.
An example of a data storage library system for redundantly storing and accessing data volumes stored on removable data storage media in a plurality of data storage libraries is described in the incorporated coassigned K. F. Day III et al. application. The library controller of each library provides an updatable synchronization token directly associated with each data volume. A plurality of directors are provided, each separate from and coupled to the hosts and each separate from and coupled to each data storage library. Each director responds to separate, partitioned data storage drive addresses addressed by the hosts. The responding director supplies each data volume supplied from a host to all of the data storage libraries, and updates each synchronization token directly associated with the supplied data volume. Thus, the directors store duplicate copies of the data volume in the data storage libraries without involvement by the host. In most data processing applications, it is critical to access the most current data. Hence, the currency of the data volumes are each tracked by means of the directly associated synchronization token, and the synchronization token is not tracked by the host.
The time to access a data volume in the cache storage may be faster than the time to access a data volume in the backing storage by at least an order of magnitude. This is because access to data volumes in cache storage is accomplished at electronic speeds, or at speeds of hard disk drives, while the robot must fetch the data storage media containing the data volume from its storage shelf, and move the data storage to a data storage drive, then load the data storage media and locate the requested data volume. It is thus advantageous to access data volumes in cache storage, a xe2x80x9ccache hitxe2x80x9d rather than to have to wait for the extra time to access data volumes in the backing storage, a xe2x80x9ccache missxe2x80x9d.
In the incorporated K. F. Day III et al. application, the director responds to a recall request for an identifiable data volume by requesting all of the synchronization tokens from the coupled data storage libraries pertaining to that data volume. The director employs the synchronization token to determine a currently updated synchronization token for the identifiable data volume, and accesses the identifiable data volume at the data storage library having a currently updated synchronization token. In the incorporated coassigned T. W. Bish et al. application, if more than one data storage library has the most current synchronization token, the copy of the data volume stored in cache storage of a library is accessed rather than the copy stored in the backing storage of the other library.
If the synchronization tokens indicate that two copies are the most current, and both copies are stored in cache storage and at the same access level of cache storage, the incorporated Bish et al. application employs a xe2x80x9cnormalxe2x80x9d algorithm to select the library for accessing the data volume, such as a rotating round robin basis.
However, such xe2x80x9cnormalxe2x80x9d algorithms may result in attempting to access the data from a data storage library that is fully occupied handling existing jobs, such that the relative job load between the libraries is unbalanced.
An object of the present invention is to select the data storage library to access a redundant copy of an identifiable data volume so as to balance the workload between the data storage libraries.
A data storage library system with a plurality of automated data storage libraries, and at least one host or director, accesses a redundant copy of an identifiable data volume employing a method, which may be computer implemented, that utilizes the idle time status of each library to balance the workload.
Each data storage library has a library controller, and at least two access levels of storing and accessing the identifiable data volumes at different access speeds. The access levels may be a cache storage which operates at electronic speeds, and a backing storage such as tape cartridges which must be accessed from storage shelves at mechanical speeds. The library controller provides a synchronization token directly associated with each data volume which identifies the update level of the data volume. Additionally, the token provides a flag which indicates the access level of the identifiable data volume in the data storage library.
A director requests a data volume, and the library controller of each library determines its current idle time status and provides the encoded idle time status to a requesting director when it provides the synchronization token directly associated with the requested data volume.
The director reads the synchronization tokens directly associated with the data volume from the data storage libraries; the director determining from the read tokens whether a plurality of the redundant copies of the data volume are at the most current update level and at the same access level and none of the copies of the data volume is at a faster access level, such that the copies of the data volume are stored in the data storage libraries at the same fastest available access level.
The director, upon the determination indicating that at least two of the copies of the data volume are at the same fastest available access level, compares the provided idle time status of the data storage libraries storing those copies, and indicates which library provides the greater idle time status. The director then accesses the data volume from the indicated data storage library.
The idle time status may comprise the percentage of available operating time that the library is idle, and may be a combination of the percentage of available operating time the library is idle and the percentage of available operating time the library is in I/O wait state, or other indicators relating to the loading of the library.