Computer networks have become ubiquitous in the home, office, and industrial environment. As computer networks have grown ever more complex, automated mechanisms for organizing and managing the networks have emerged. These mechanisms are generally implemented in the form of one or more computer programs, and are generically known as network management systems or applications.
FIG. 1 is a simplified diagram of a network 100 that is managed by a network management station 10. The network 100 comprises one or more network devices 102, such as switches, routers, bridges, gateways, and other devices. Each network device 102 is coupled to another network device 102, or to one or more end stations 120. Each end station 120 is a terminal node of the network 100 at which some type of work is carried out. For example, an end station 120 is a workstation, a printer, a server, or similar device.
Each network device 102 executes a network-oriented operating system 110. An example of a network-oriented operating system is the Internetworking Operating System (IOS) commercially available from Cisco Systems, Inc. Each network device 102 also executes one or more applications 112 under control of the operating system 102. The operating system 102 supervises operation of the applications 112 and communicates over network connections 104 using an agreed-upon network communication protocol, such as Simplified Network Management Protocol (SNMP).
Each device 102 stores information about its current configuration, and other information, in a Management Information Base (MIB) 114. Information in the MIB 114 is organized in one or more MIB variables. The network management station 10 can send fetch and set commands to the device 102 in order to retrieve or set values of MIB variables. Examples of MIB variables include sysObjectID.
Preferably the network management station 10 is a general-purpose computer system of the type shown and described further herein in connection with FIG. 10. The network management station 10 executes one or more software components that carry out the functions shown in block diagram form in FIG. 1. For example, the network management station 10 executes a basic input/output system (BIOS) 20 that controls and governs interaction of upper logical layers of the software components with hardware of the network management station. An example of a suitable BIOS is the Phoenix ROM BIOS. The network management station 10 also executes an operating system 30 that supervises and controls operation of upper-level application programs. An example of a suitable operating system is the Microsoft Windows NT.RTM. operating system.
The network management station 10 executes an asynchronous network interface 50 or ANI under control of the operating system 30. The ANI 50 provides an interface to the network 100 and communicates with the network using SNMP or another agreed-upon protocol. The ANI 50 provides numerous low-level services and functions for use by higher-level applications.
The network management station 10 executes a network management system 40 that interacts with a database 60 containing information about the managed network 100. The network management system 40 is an example of a network management application. Using a network management application, a manager can monitor and control network components. For example, a network management application enables a manager to interrogate devices such as host computers, routers, switches, and bridges to determine their status, and to obtain statistics about the networks to which they attach. The network management application also enables a manager to control such devices by changing routes and configuring network interfaces. Examples network management applications are CiscoWorks, CiscoWorks for Switched Internetworks (CWSI), and CiscoView, each of which is commercially available from Cisco Systems, Inc.
Contemporary information processing involves execution by a processor or computer of a computer program, process, or routine, all of which are called "processes" in this document. In many contexts, execution of a process by a processor may be delayed when the process is required to wait for an external process or device to carry out some other task. When no such delays occur, it is has been recognized that significant processing time is saved, and the use of processing resources is maximized, by executing several processes concurrently or in parallel.
For example, consider a network management system that is used to monitor and manage the operation of a computer network that comprises numerous network devices. The network devices comprise switches, routers, and other devices that connect to the external world and are also called "managed devices". In such systems, the network management application program often must wait for a managed device carry out another task and to respond to the network management application. It is desirable to configure the system so that it communicates with several devices during one general time period, so that the network management system communicates with a second device while it is waiting for the first device to become available.
Prior parallel processing approaches do not address significant problems that arise in the network environment. For example, problems arise when parallel processing is applied in a network environment that uses a shared database of network device information. In particular, it is possible that one process could change the shared database at the same time that a second process is attempting to change the shared database.
Further, in network management systems, there is a continuing need to modify the network management system and its associated database to accommodate new network devices and new services for the network devices. Often, the network management system is modified by installation of patches, upgrades, and other modifications at the customer's site. This is called field modification or field extensibility. A parallel processing mechanism, in this context, must be able to adapt to new processes and apply parallel processing to new data set definitions that are installed in the field.
Generally, the use of a shared data model, and field extensibility, create three major problems.
First, determining the order of execution of components of a process is a problem. Because the components of the system are independently developed but share access to a common data model, the problem of when code is executed arises in ways it does not arise in monolithically developed code. In particular, in monolithic code, the execution order of components is explicitly expressed in static sequencing (particular components are explicitly stated and invoked at the right time as determined by the coder). In our case, because code for various execution threads is provided by independent developments, execution order can not be explicitly stated. Thus a mechanism for execution order determination had to be added.
Second, providing for parallel execution of the independently provided components is a problem. Usually, components are executed in parallel by the developer carefully constructing appropriate synchronization and thread initiation mechanisms for custom tailored code. In our case, the code is not custom tailored for parallel execution since even the order of execution is not known when the code is written. Thus, we need a mechanism which minimally effects the writing of the code but which permits for the code to operate correctly at the right time.
Third, providing synchronization among the parallel threads accessing common data is an issue. In the usual case of monolithically developed code, explicit synchronization of access to objects is performed by the programmer(s) of the monolithic code. In this case, successful synchronization is tricky and prone to errors (deadlock, starvation, etc.). In our case, there are several independent and uncoordinated streams of development. As a result, independent developers cannot perform synchronization without some kind of support.
In this context, there is a need to provide access to and control of heterogeneous networks of network devices, in support of a network management system, in a robust, efficient, extensible framework. The system that provides access to and control of the devices needs to have several characteristics. The system should offer device extensibility, namely, the ability to support new devices and new versions of existing devices without a new release of the network management system.
The system should provide service extensibility, namely, the ability to support new services without major revision. There is also a need for the system to be robustness, namely, that the system exhibits tolerance of failures and resource recovery. There is a further need for the system to be efficient. Efficiency requires that the system take advantage of the parallelism possible while interacting with network devices. Efficiency is required to support large networks that may have more than 1,000 network devices.
Based on all the foregoing, there is a clear need for a simple mechanism that enables an application program to execute two or more processes in parallel with respect to sets of data relating to different network devices.
There is also a need for a simple parallel processing mechanism that is declarative in nature.
There is a particular need for a system that provides for parallel execution whereby the execution order of parallel components is determined.
There is also a need for a system that supports designation of execution join points at which parallel execution stops until parallel execution is later required.
There is also need for such a system in which there is write access to common objects.