1. Field of the Invention
The invention relates generally to computer networks and more particularly to a method for recovery from application system failure in order to provide continuous availability of the application system.
2. Prior Art
Prior art computer networks are controlled by a system architecture which insures the orderly flow of information throughout the system. Systems network architecture (SNA) is a system architecture developed by IBM Corporation which controls the configuration and operation of a computer communications network. It provides the description of the logical structure, formats, protocols, and operational sequences for transmitting information units through the network.
The network is composed of nodes interconnected by communications facilities. The nodes may be of widely varying functional capability, ranging from terminals with minimal native processing capability to complex multiprocessors. The communication facilities also come in a number of varieties ranging from high speed I/O channels to low speed, point-to-point telephone lines and including such media as satellite links and wide-band optical fibers.
Each node is comprised of a physical unit (PU) which controls the physical resources of the node (e.g., links) and one or more logical units (LU) which are used to partition, allocate, and control the devices associated with end-user communications.
The Virtual Telecommunication Access Method (VTAM) is a telecommunications access method software program, developed by IBM Corporation, which is resident in a host processor and provides an interface between the host processor and other resources in the computer network. A VTAM application program is a program that uses VTAM macro instructions to communicate with terminals. VTAM allows a plurality of application programs to be used at a single terminal. An application program within a host processor can be used at any location in the network without the program having any awareness of network organization.
Users in the network communicate by establishing a session between the logical units (LU) that represent them. A session involves a definition of the characteristics of the communication between two end-users. Each logical unit couples a user to the SNA network. Two logical units can have multiple logical connections or parallel sessions established between them.
Currently, when a network application fails, all of the sessions of the application are terminated (unbound). Application recovery requires the sessions to be re-established. This process is slow, thereby causing application recovery to take an unacceptably long time, especially if there was a large number of sessions.
Any fault tolerant solution requires two basic ingredients redundancy and state recording. Redundancy may come in the form of duplicate hardware and software, along with the appropriate access paths (e.g., busses, links, cache, etc.). State recording is a process of recording enough processing state information during normal processing such that when a fault occurs and recovery is invoked, a consistent "next" state can be constructed in order that the process can continue properly.
One solution to this problem has been to add additional hardware and software system elements to create an alternate application subsystem which is kept synchronized with the active subsystem. For example, an alternate processor with the same type of application program can establish back-up sessions for any of the sessions that the primary host processor has active currently. If the primary processor was unable to perform its function for any reason, such as hardware, operating system, VTAM or application failure, the alternate processor could be used immediately to service the users that had active sessions with the primary processor. A major drawback to this approach is that it requires purchase of redundant processor hardware and software. Moreover, a separate back-up session is required for each active session.