The present invention relates in general to computer networks, and more particularly, to a recovery facility for a System Network Architecture (SNA) communication system employed by a computer network.
Prior art computer networks are controlled by a system architecture which insures the orderly flow of information throughout the system. Systems Network Architecture (SNA) is a system architecture developed by IBM Corporation which controls the configuration and operation of a computer communication network. It provides the description of the logical structure, formats, protocols, and operational sequences for transmitting information units through the network.
The network is composed of nodes interconnected by communications facilities. The nodes may be of widely varying functional capability, ranging from terminals with minimal native processing capability to complex multiprocessors. The communication facilities also come in a number of varieties ranging from highspeed I/O channels to low speed, point-to-point telephone lines and including such media as satellite links and wide-band optical fibers.
Each node is comprised of a physical unit (PU) which controls the physical resources of the node (e.g., links) and one or more logical units (LU) which are used to partition, allocate, and control the devices associated with end-user communications.
The Virtual Telecommunication Access Method (VTAM) is a telecommunications access method software program, developed by IBM Corporation, which is resident in a host processor and provides an interface between the host processor and other resources in the computer network. A VTAM application program is a program that uses VTAM macro instructions to communicate with terminals. VTAM allows a plurality of application programs to be used at a single terminal. An application program within a host processor can be used at any location in the network without the program having any awareness of network organization.
Users in the network communicate by establishing a session between the logical units (LU) that represent them. A session involves a definition of the characteristics of the communications between two end-users. Each logical unit couples a user to the SNA network. Two logical units can have multiple logical connections or parallel sessions established between them.
Conventionally, when a network application fails, all of the sessions of the application are terminated (unbound). Application recovery requires the sessions to be reestablished. This process is slow, thereby causing application recovery to take an unacceptably long time, especially if there was a large number of sessions.
A fault tolerant solution requires two basic ingredients: redundancy and state recording. Redundancy may come in the form of duplicate hardware and software, along with the appropriate access paths (e.g., busses, links, cache, etc.). State recording is a process of recording enough processing state information during normal processing such that when a fault occurs and recovery is invoked, a consistent xe2x80x9cnextxe2x80x9d state can be constructed in order that the process can continue properly.
One solution to this problem has been to add additional hardware and software system elements to create an alternate application subsystem which is kept synchronized with the active subsystem. For example, an alternate processor with the same type of application program can establish back-up sessions for any of the sessions that the primary host processor has active currently. If the primary processor was unable to perform its function for any reason, such as hardware, operating system, VTAM or application failure, the alternate processor could be used immediately to serve the users that had active sessions with the primary processor. A major drawback to this approach, however, is that a complete backup subsystem is needed, as well as a separate back-up session for each active session.
Therefore, one aspect the present invention comprises an enhanced recovery technique for a system network architecture (SNA) communication network. This recovery technique includes: bringing up a backup processor upon detection of a failing processor in the SNA network, wherein both the failing processor and the backup processor support a multi-link transmission group (TG); and activating a new communication link between the backup processor and an SNA communications controller previously linked to the failing processor, wherein the SNA communications controller recognizes that the new communication link to the backup processor has a same subarea address, a same virtual route and a same TG number as the previous link to the failing processor, thereby accomplishing substitution of the backup processor for the failing processor in the SNA communication network.
Systems and computer program products corresponding to the above-outlined recovery facility are also described and claimed herein.
To restate, failure of a host processor can be a time consuming and costly problem within an SNA network. When failure occurs, a new processor may need to be brought in to replace the failed processor. If the failing processor has an SNA network using subarea (PU5) support with links to a communications controller using Network Control Program (NCP), it is conventionally necessary to restart the entire SNA network.
A recovery facility as presented herein advantageously allows the new host processor to acquire the network traffic that was running on the failed processor without requiring restarting of the network. Further, the solution presented herein is simple, building on existing SNA features. The recovery facility presented requires a small amount of processing effort to provide a very powerful and highly desirable capability. Avoiding restart of SNA networks saves customers time and money. Numerous customers run large SNA networks using subarea support and are not likely to migrate to newer communication facilities in the near future.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered part of the claimed invention.