This invention relates to a method for supporting recovery from a failure of a storage device in a computer system where a batch job is executed.
Generally, a batch job is executed on a computer system collectively processing a large quantity of data. The minimum data-set required for re-execution of jobs is retained on media, such as a magnetic tape. However, for reducing operation loads, for shortening the operation time, and for saving resources, such as a magnetic disk or a magnetic tape, a great part of a data-set is deleted or remains in external storage devices after processing.
After occurrence of a failure in the external storage device, in general, it is impossible to confirm the record contents of the external storage device. In order to select a procedure for recovering the external storage device, therefore, it is necessary to pursue a job control language (JCL) list that represents an execution history of a batch job or a list of the assignment medium of the data-set and comprehensively grasp the relationship between an input-output data-set among plural jobs. Because, ordinarily, such processing is performed by a person, there are the following problems. First, if an error arises halfway in pursuit and arrangement, it might be mistaken in a grasp of the relationship between an executed job and the transition of a data-set after that. Accordingly, if a failure occurs after execution of a lot of jobs, it is substantially impossible to recover from the failure. Secondly, it is typical for the backup of the contents of external storage devices to be applied to a magnetic tape in a predetermined cycle in order to use it for recovery when a failure has occurred. However, the contents of external storage devices at a time of failure are changed from the contents present at the time of backup which are acquired because of execution of a batch job. As a result, if the contents in external storage devices are restored by using a backup tape, data-set having the same name is created plural times, or a data-set that has been deleted is restored. These factors cause recovery from a failure to be delayed. If a backup is acquired for a unit of a necessary data-set to resolve these problems, this is a cause of the operation time to linger and a cause of increasing operation load and/or a worsening of the maintainability at a time of addition and/or a change of the data-set and job. Further, procedures, such as pursuing a transition of the data-set which is changed by executing a batch job and grasping a correlation among a plurality of jobs cannot be omitted. Therefore, there is little contribution to the shortening of the recovery time.