1. Field of the Invention
The present invention relates generally to network communications, and more particularly to automating a process for monitoring the occurrence of exceptions and abnormal ends while processing batch jobs on computer systems.
2. Related Art
Business enterprises that utilize mainframe and distributed computers for data processing have a critical need to monitor executable programs to ensure successful completion of the programs. Commonly, such processing is achieved using batch jobs that run on Multiple Virtual System (MVS) mainframe computers. Many of these programs perform critical business functions, such as operations or customer billing. Such jobs are known as production jobs, since they are executed in a production environment rather than a test or development environment.
It is not uncommon for a batch production job to encounter a problem during processing and terminate unexpectedly. This is known as an ABEND or an Abnormal End. A job ABEND may be initiated by the job itself, by a user, or by the operating system if the job has run out of control. A job may ABEND for a number of reasons, such as encountering a shortage of virtual memory. It is also not uncommon for a batch job to complete processing, but encounter an error in processing and issue a return code indicating so. Such an occurrence is known as an exception.
If a production job ABENDs or completes as an exception, it is crucial for the problem that caused the ABEND or exception to be fixed and the production job restarted, so that the critical business functions may proceed. Many enterprises employ Production Operations personnel to monitor production jobs for ABENDs and exceptions, analyze the cause of the problem, implement a fix, and resubmit the job for processing.
Production Operations personnel use a number of tools to accomplish these tasks. They monitor production jobs for ABENDs and exceptions using conventional products, such as CA-7, developed by Computer Associates (CA). When an ABEND or exception is found, they may view system log files and error messages for problem analysis through an IBM mainframe facility known as Interactive Output Facility (IOF). Often the problem may be fixed by changing the Job Control Language (JCL) of the job. JCL specifies to the operating system (MVS) the requirements for running a job. JCL can be edited by the Production Operations personnel in another IBM facility known as Interactive System Productivity Facility (ISPF). Other fixes may be implemented using another CA product known as CA-11. CA-11 can also be used to restart the job.
Other tools are available to Production Operations personnel to resolve problems. These may include electronic mail (e-mail) and paging systems for notifying appropriate personnel of problems, and Problem Management Systems (PMS) for recording and tracking problems.
Thus, a number of different tools are used by Production Operations personnel to accomplish the tasks of monitoring, analyzing, fixing, and restarting jobs, as well as notifying key personnel, if necessary. For a typical business enterprise, such as MCI Telecommunications, a staff of personnel must monitor about 900,000 production jobs a month on a 7 days/week, 24 hours/day schedule. These jobs span across twelve (12) logical data centers. Of the 900,000 production jobs, it is typical for about 4% (or about 36,000) of the production jobs to ABEND. Using a number of different systems and tools makes the tasks of Production Operations personnel extremely difficult, time consuming, and prone to human error.
Other aspects of the process add to the difficulty. For example, the monitoring process is cyclical; total jobs are monitored sequentially, so that when the end of the sequence is reached, monitoring continues with the beginning of the sequence. If a job ABENDs shortly after it has been monitored, it may be 60-90 minutes before monitoring returns to that job in sequence and attention may be given to it. This timespan can introduce an unacceptable delay in the critical execution of production jobs. These delays could be minimized if proactive notification of job ABENDs and exceptions were automated.
In addition, enterprises, such as MCI, are currently implementing additional computing platforms for their data processing needs. Such platforms may include mid-range components running with UNIX operating systems. This further complicates the ability of Production Operations personnel to perform their tasks in that they must not only learn to interface with these new platforms, but they will also be required to use additional tools to perform their tasks. Also, faster platforms may increase the number of production jobs that require monitoring. Clearly, there is a need to automate the process of monitoring jobs, and to integrate the multiple environments in which an analyst must work in order to utilize the various tools needed for analyzing, fixing, and restarting jobs.