1. Field of the Invention
The present invention relates to a data processing apparatus for processing a large amount of data efficiently according to a batch processing system, in which data flows unidirectionally between any nodes within a data network, in the fields of banking business, distribution industry, service industry, or the like.
In recent years, the amount of data which is to be processed by a mainframe has rapidly increased owing to an enlargement in business or an introduction of electronic data processing (EDP) systems. The time required for handling a batch job that processes a large amount of data has remarkably increased.
By the way, in the fields of banking business, distribution industry, service industry, or the like, a tendency toward the extension of time required for an online job has become outstanding. The finish time of a nighttime batch job, which is to be carried out in the nighttime after the online job has finished, may be the time in the midnight or in the early morning of the next day. This brings about tendencies not only for an increase in operation cost, but also for an adverse effect on the online job the next day. For this reason, it has become very necessary to realize a relatively fast batch process. A function developed in an effort to cope with this situation is a parallel batch job, that is, an "Excel Batch".
The present invention pertains to a techniques for detecting, in advance, an occurrence of a deadlock corresponding to a wait state for transmission or reception of data, which is likely to occur during implementation of an Excel Batch that is a function of shortening the processing time for a batched job in a general-purpose computer.
2. Description of the Related Art
Now, a background of the Excel Batch, that has come to be employed in a conventional data processing system, will be briefly described in order to clarify the ability of an Excel Batch to shorten the processing time for a batch job.
A method generally adopted in an ordinary routine batch process is such that one job is divided into a plurality of jobs or job steps, and that a temporary data set is used to link the thus divided jobs or job steps. These jobs or job steps are processed sequentially. The Excel Batch realizes a relatively fast batch process by paying special attention to this point. The Excel Batch allows jobs or job steps, which are conventionally executed sequentially, to be processed in parallel by making access to data stored in a temporary data set.
Inherited data flowing between different jobs or job steps utilizes system storage. This makes it possible to solve a problem related to an input/output process that becomes a bottleneck for a direct access storage device (DASD).
In a conventional batch process in which an Excel Batch is not utilized, a succeeding job or job step cannot accept data until a preceding job or job step outputs all the data to a temporary data set (that is, an intermediate data set). Jobs or job steps are therefore executed sequentially. Consequently, a large lapse of time (i.e., a lot of execution time) occurs. Moreover, since the inherited data is transferred via a DASD, magnetic tape (MT), or the like, much input/output time is needed.
On the contrary, the Excel Batch makes it possible to execute a preceding job or job step and a succeeding job or job step in parallel, and to output or input data between jobs or job steps, by using a plurality of pipe data sets residing in system storage.
By utilizing such an Excel Batch, the lapse of time (i.e., the amount of time or execution time) can be shortened owing to an execution of jobs or job steps, and the input/output time required to inherit data between jobs or job steps via a system storage can be shortened. Eventually, the lapse of time required for such a batch process can be shortened drastically.
The Excel Batch has been designed exclusively for fields of business in which realizing a relatively high-speed batch process is an important subject (banking business, manufacturing industry, insurance business, distribution industry, service industry, securities financing, public utilities, and the like), and has proved effective when adapted for a routine batched job which is to be executed in batch processing systems ranging from medium-scale system through large-scale system.
Even when the Excel Batch is used, it is unnecessary to modify programs written in a high level language (COBOL or PL/I) that adopts a conventional data management access method (QSAM or BSAM). However, a modification is needed to some degree for job control languages (JCLs), i.e., job control language statements. One of the reasons is that an execution of jobs in parallel is a new concept. Moreover, the Excel Batch can be applied to input/output files which are to be handled by a sort/merge program.
Furthermore, in the Excel Batch, the pipe data set is used as a data set in system storage which is used to transfer data between jobs or job steps that are to be executed in parallel.
Data output from a preceding job or job step are passed immediately to a succeeding job or job step via a pipe data set. When the data has been passed to the succeeding job or job step, data in the pipe data set are deleted. Thus, the pipe data set is utilized as an area temporarily holding data (i.e., a storage area) in a system storage. Even if the amount of inherited data is relatively large, the data can be processed in a small area in the system storage. Thus, the system storage can be utilized effectively.
In other words, the Excel Batch is a function for temporarily holding data which are to be inherited between a preceding job or job step of a batch job and a succeeding job or job step thereof in a pipe data set in the system storage, transmitting or receiving the data by a unit of a record or block, from or to the pipe data set so that various processes which are required for the data can be carried out in parallel, and thus contributing to a drastic reduction of processing time.
In the prior art, the lapse of time required for a batch process has been remarkably shortened by adopting a data processing system in which the Excel Batch can be used for the batch process.
Herein, it should be noted that within the foregoing ability of the Excel Batch, each of a plurality of pipe data sets has a predetermined storage capacity. When the amount of data in each pipe data set exceeds the above storage capacity, since the timing of a data writing portion differs from that of a data reading portion, the data stagnates (i.e., the data does not flow). Therefore, a wait state occurs with regard to a transmission request for the pipe data set.
When a function of the Excel Batch having the foregoing features is adapted to a conventional data processing system, job steps that have been executed sequentially will operate in parallel. Depending on a logical instruction of an application program, a wait state for transmission or reception of data may be established. This leads to the fear of causing a so-called "deadlock".
For clarifying the problem that the conventional data processing system utilizing a function of the Excel Batch is prone to a deadlock, a pattern of communication routes causing a deadlock will be described with reference to a conceptional diagram of FIG. 1. Herein, for simplifying the explanation about such a pattern, a job composed of two job steps and two pipe data sets (hereinafter, these pipe data sets will sometimes be referred to as "pipes") will be taken for instance.
In FIG. 1, it is assumed that each of a first pipe data set PDA and a second pipe data set PDB (referred to as pipes A and B in FIG. 1, respectively) has a data capacity sufficient to transmit or receive the amount of data only twice.
In FIG. 1, after a first job step J1 transmits data three times to a pipe A, the first job step J1 shall transmit data three times to a pipe B. A second job step J2 receives data three times from pipes A and B alternatively.
The first job step J1 causes a transmission wait state when the data transmission is carried out at the third time relative to the pipe A. The second job step J2 causes a reception wait state when the data reception is carried out at the first time relative to the pipe B.
In this case, all job steps become a wait state for transmission or reception. The batch process is therefore discontinued, and a deadlock has occurred. Once a deadlock state corresponding to such a deadlock is established, such a deadlock state will not be cancelled.
This occurrence of a deadlock may not be recognized until several hours have elapsed after a job is started. This causes a problem in that the function of an Excel Batch cannot be utilized effectively.
Patterns of communication routes having a possibility of causing a deadlock due to an occurrence of a wait state for data transmission or reception are presumably patterns of communication routes that consist of two or more than two communication routes through which data flows from a certain job step to another job step via a pipe.