1. Field of the Invention
The present invention relates to data management systems that use removable data storage media such as magnetic tape. More particularly, the invention includes a data management system that, responsive to a migration request for a particular data object, automatically invokes a shortcut migration process that finds the previously migrated copy of the data object and reconnects to the copy.
2. Description of the Related Art
With the increasing importance of electronic information today, there is a similar increase in the importance of reliable data storage. The market abounds with different means of data storage today, ranging from high-speed, more expensive media such as random access memory (RAM), to slower speed, less expensive products such as magnetic tape. Some advanced, xe2x80x9chierarchicalxe2x80x9d systems utilize multiple levels of data storage, often high-speed, direct-access storage (such as magnetic disk drive storage) for frequently used data, and relatively lower-speed removable storage media (such as magnetic tape) for infrequently used data. One example of such a system is the IBM System Managed Storage product, which includes the DFSMShsm component.
The movement of data from disk to tape in a hierarchical storage system is called xe2x80x9cmigration.xe2x80x9d A single tape might contain hundreds or thousands of migrated datasets. When a migrated dataset is referenced by a user, the dataset is copied back onto the disk in a movement known as xe2x80x9crecall.xe2x80x9d One example of recall appears in IBM Technical Disclosure Bulletin, Vol. 26, No. 9 (February 1984), which is incorporated herein by reference. With recall, the copy left on tape is invalidated in favor of the copy recalled to disk. This usually works well, because any changes to the recalled data will render the copy left on tape worthless; namely, the nature of serially accessible storage media prevents updating the tape copy to match the disk copy.
The DFSMShsm program maintains an inventory of migrated datasets, and uses this inventory to aid in the recall of datasets. The DFSMShsm program also keeps a limited inventory of recalled datasets (which exist on tape but are considered invalid), but only for a brief, fixed period of time. If a recalled dataset becomes inactive, DFSMShsm software re-migrates the recalled dataset back to tape. This re-migration can be time consuming because it requires copying the dataset""s entire contents from disk to tape. In cases where the recalled dataset was never changed, this copying is wasted work because the originally migrated data copy (on tape) is the same as the recalled version (on disk).
To address this performance issue, and expedite data re-migration, various approaches have been developed to xe2x80x9creconnectxe2x80x9d previously recalled datasets. Broadly, reconnection updates and recreates inventory records rather than again copying data from disk to tape, allowing fast-migration of unchanged recalled data whose migration copy still exists on tape, although flagged as invalid. With one reconnection approach, known as xe2x80x9crecall browse,xe2x80x9d the storage system reconnects datasets back to their tape versions in response to operator-issued commands. The end user must issue a command for each and every dataset to be reconnected. Although this function is beneficial in certain respects, significant user activity is required to evaluate datasets for reconnection, requiring the user to determine if the data object had ever been migrated, determine if that migration copy exists, and if the copy exists, is it identical. Furthermore, the user may be unaware of certain datasets for which reconnection is nonetheless possible. In addition, there is some danger of improperly reconnecting datasets that have changed since recall, and are therefore not suitable for reconnection.
Improving upon the recall browse feature, others developed a reconnection procedure with more automated features. With the more-automated reconnection feature, software supplements the migration process by automatically considering the possibility of reconnecting data. This approach provides the advantage of greater automation, since the end user does not have to manually instigate the recall process, and because more datasets can be considered for reconnection than are possible by manual user command. Although beneficial in some respects, the more-automated approach still suffers from certain limitations. Chiefly, reconnection using the more-automated approach can be time consuming because various input/output operations are required to determine whether a dataset is suitable for reconnection. For instance, time-consuming work is required to determine whether the migration copy exists, and whether it is identical to the recall (disk) copy. In many cases, these operations are wasted, such as when a dataset being considered for reconnection has never been migrated and therefore cannot possibly be a reconnection candidate. When a large number of data objects are being migrated to tape, evaluating each dataset for reconnection can delay the migration by a considerable time.
Consequently, known reconnect procedures are not completely adequate for some applications due to certain unsolved problems.
Broadly, the present invention concerns a data management system that responds to each migration request for a particular data object by automatically invoking a shortcut migration process that finds a previously migrated copy of the exact data object, if it exists, and automatically reconnects that copy. More specifically, this data management system includes a primary level of storage (such as direct-access storage) and an auxiliary level of storage (such as multiple removable data storage media). An inventory stores metadata identifying data objects contained in the auxiliary level. A catalog includes metadata identifying data objects contained in the primary level, and whether such data objects are reconnectable.
When the data management system receives xe2x80x9crecallxe2x80x9d requests to copy target data objects from the auxiliary to the primary level, the system performs certain recall operations for each target data object as follows. The system determines whether the target data object meets prescribed future-reconnection criteria, and if so, it updates the catalog to include an expedited access indicator associated with the target data object. The system copies the target data object from the auxiliary level to the primary level. The system also updates the inventory to invalidate the metadata identifying the target data object in the auxiliary level, thereby deactivating the target data object in auxiliary storage. The system also prepares expiration information to be used in determining when to delete the invalidated inventory metadata for the target data object.
When the data management system receives xe2x80x9cmigrationxe2x80x9d requests to copy specified data objects from the primary level to the auxiliary level, the system performs certain migration operations for each specified data object as follows. If the catalog does not contain an expedited access indicator associated with the target data object, the system copies content of the specified data object from the primary level to the auxiliary level in a xe2x80x9cfullxe2x80x9d migration operation. On the other hand, if the catalog contains an expedited access indicator associated with the specified data object, the system determines whether restoration of the copy of the specified data object on the auxiliary level is possible. If restoration is not possible, the system performs a full migration. On the other hand, if restoration is possible, the system updates the inventory to restore previously invalidated metadata identifying the copy on the auxiliary level as being the specified data object, instead of re-copying contents of the specified data object from the primary level.
As mentioned above, the system also prepares certain expiration information. Namely, the system establishes a prescribed expiration schedule for metadata identifying auxiliary level copies of recalled data objects based upon access history of the data object. According to this schedule, the system cleans the inventory by removing invalidated metadata. Removing invalidated metadata prevents the inventory size from continually growing. Whenever the inventory is cleaned of metadata associated with certain data objects, the catalog may be updated to clear the expedited access indicators with these data objects. As an alternative, expedited access indicators may be cleared under other circumstances indicating an unusable auxiliary level copy of recalled data. One example occurs when the recalled data object is backed up, since the backup is presumably done to preserve changes in the recalled data object on the primary level.
The foregoing features may be implemented in a number of different forms. For example, the invention may be implemented to provide a method including a shortcut migration operation achieved by efficient, automatic reconnection to previously migrated data. In another embodiment, the invention may be implemented to provide an apparatus such as a data management system, configured to perform shortcut migration according to this invention. In still another embodiment, the invention may be implemented to provide a signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital data processing apparatus to perform a shortcut migration operation according to this invention. Another embodiment concerns logic circuitry having multiple interconnected electrically conductive elements configured to perform shortcut migration as described herein.
The invention affords its users with a number of distinct advantages. Basically, the invention saves time by avoiding a full migration to auxiliary storage where possible, since a full migration of a large data object can take hours to complete. Instead of full migration, the invention performs a shortcut migration that restores a deactivated copy of data on auxiliary storage. Advantageously, the invention efficiently determines reconnect candidacy by consulting a catalog that is necessarily consulted for other reasons during reconnection anyway. From the standpoint of overhead, the shortcut migration is beneficial because it has a high likelihood of successful completion. One reason is the expedited access indicator, which helps to quickly exclude data objects for which reconnection is not possible. Also, success of reconnection is aided by preserving invalidated metadata identifying recalled data objects in auxiliary level storage according to a use-based predictive schedule, which likely preserves metadata for future reconnection if needed. As a further advantage, reconnection quickly enables the dataset to be scratched from primary level storage, freeing the typically more expensive primary level storage for storage of other data. As still another benefit, reconnecting datasets instead of copying the datasets to another auxiliary level storage media conserves media and reduces the need to clean and recycle media that become cluttered with deactivated data objects that could have been re-used through reconnection. This invention also provides a number of other advantages and benefits, which should be apparent from the following description of the invention.