Computer networks were formerly designed using a central computer (mainframe) accessed by plural user terminals via network connections wherein different users accessed applications and data stored in the mainframe. In currently designed networks, processing is distributed among devices of many types using peer-to-peer communication. In distributed networks, a number of computers operate as servers, which are accessible by client computers. Each server may provide one or more services. A server may, in turn, be a client for services of another serving computer that acts a server. In providing services, the servers often communicate with each other by an arrangement of network devices including routers, hubs, switches and firewalls. FIG. 1 shows an example of a network in which plural nodes including at least two segments 100 and 150 are connected to network coupler 130. In the segment 100, nodes 101 through 112 are interconnected by a bus 118. A router 115 connects the segment 100 to the network coupler 130. In the segment 150, nodes 151 to 162 are interconnected by a bus 168 and a router 165 connects the segment 150 to the network coupler 130. An Administrative node 450 is also connected to the network to monitor, control, examine and evaluate network functioning.
It is well known in network systems for nodes to use the seven-layer OSI (Open Systems Interconnect) model such as shown in FIG. 2. The OSI model has a physical layer 1 denoted as 200, a data link layer 2 denoted as 210, a network layer 3 denoted as 220, a transport layer 4 denoted as 230, a session layer 5 denoted as 240, a presentation layer 6 denoted as 250 and an application layer 7 denoted as 260. In the OSI arrangement of FIG. 2, control is passed from one layer to the next in a layer hierarchy starting, outgoing, at the top application layer 7 of one system, proceeding to the bottom physical layer 1, and, incoming, back up the layers of the other system from the bottom physical layer 1 to the top application layer 7. A server generally provides seven layers but other network devices may provide fewer layers. For example, a router typically is responsible for four layers.
More recently, application platforms have been introduced as a new category. Such application platforms (e.g., Microsoft .NET, J2EE and database application servers) provide common shared facilities for networked applications that run on the application platform extending from the presentation layer 6. While an application platform in a computer appears as one OSI layer 7 application, it hosts one or more independent applications and each of the independent applications may host one or more software elements that provide different services. FIG. 3 shows a portion of such an OSI model in which an extended model application platform layer 7 (310), extended model layer 8 application layers (320-1 to 320-M) and extended model service layers 9 (330-1 to 330-N) are inserted in that order in the OSI application layer 7 (260) to provide n software elements.
A network generally spans many computers and a software element of the network may interact with one or more software elements that are co-located within the same application or application platform. The software elements, however, only interact through a well-defined isolation boundary (e.g., an Application Programming Interface (API), an object oriented interface, a messaging system, etc.). Such isolation allows location transparency so that the same unmodified software elements having different interfacing may be located at different machines with communication through the physical network. The use of isolation boundaries allows independent implementation of services so that a service can be replaced without modifying the service dependent thereon and a referenced service can be handled by a proxy service that manages network transmission to the real service location.
Physical networks vary in complexity from a few computers and network devices for a small business to millions of computers interconnected through the Internet. The complexity of typical network topologies has led to development of numerous systems that aid network administrators in understanding and managing their networks including determining the network elements and their interdependencies. In the Automatic Discovery of Network element system disclosed in U.S. Pat. No. 5,185,860 issued to J. C. Wu Feb. 9, 1993, discovery nodes on a network convey knowledge of the existence of other nodes on the network. As disclosed therein, a network discovery system queries the discovery agents and obtains information from the discovery agents about other nodes on the network. The discovery system then queries each of the nodes obtained to determine if that node is also a discovery agent. The process of querying discovery agents to obtain a list of nodes known to the discovery agents is repeated at timed intervals to obtain information about nodes that are not always active. In a TCP/IP network, for example, discovery agents are nodes that respond to queries for an address translation table that translates Internet protocol (IP) addresses to physical addresses. The data from each node's address translation table is used to obtain both the IP and the physical address of other nodes on the network. These nodes are then queried to obtain additional information. After all the nodes on a network are discovered, the list of nodes is written to a database where it can be displayed by the network manager or other users of the network.
U.S. Pat. No. 6,286,047 issued to Ramanathan et al. Sep. 4, 2001 discloses a method for identifying services, service providing elements and dependencies among the services and elements providing services in which first and second phases of discovery are executed. In the first phase, the services and elements providing services are detected as well as a first set of dependencies. The second phase based on results of the first phase detects inter-service dependencies, i.e., conditions in which proper operation of one service relies upon at least one other service. Various techniques may be used in executing the first phase, including accessing information in a domain name service (DNS) of the network to identify dependencies, as well as services and nodes. Discovery within the first phase may also be based upon recognizing naming conventions. In the second phase, discovery agents implemented in computer software may be deployed to access content of configuration files of applications detected in the first phase. Discovery agents may also be used to monitor connections completed via specified nodes detected in the first phase, such that other inter-service dependencies are identified. Alternatively, network probes may be deployed to access information of data packets transmitted between nodes detected in the first phase, with the accessed packet information being used to detect inter-service dependencies.
U.S. Pat. No. 6,115,393 issued Sep. 5, 2000 to Engel et al. discloses a network arrangement in which plural communication dialogs occurring among network nodes are monitored. In the dialogs, the contents of packets being transmitted among two or more communicating nodes are detected on the network. The dialogs are identified from the contents of the packets and information about the identified dialogs derived from the packet contents is stored. Each communication is effected by a transmission of one or more packets among two or more communicating nodes. Each communication complies with a predefined communication protocol selected from among protocols available in the network. The contents of packets are detected passively by monitors on network busses in real time. Communication information associated with multiple protocols is derived from the packet contents.
In the foregoing prior art patents, network nodes and their interdependencies for nodes that are network devices, computers and OSI layer 7 applications are discovered by reading machine directory services (Domain Name Service DNS) or broadcasting across networks and watching for responses. Such discovery arrangements are passive in that they detect nodes without becoming involved in the operation of the nodes by inspecting node information such as logs, configuration, etc. and produce a static model of the network that identifies node interaction. These known systems, however, do not produce a snapshot identifying which nodes of a network are currently interacting with which other nodes of the network while a particular network task is being performed and are limited to discovering interdependencies among service providing software elements where there is at most one software element per OSI layer 7 application. As the use of application platforms increases in the known systems, the limitation of one software element per application layer, which treats the entire application platform as one node regardless of how many service providing software elements are hosted, incorrectly identifies multiple software elements as one node in a network and is too restrictive. Where, as in the prior art, the entire contents of an application platform are treated as one node, discovery of interdependencies that relies on generically shared resources such as configuration files, logs, communication ports and operating system processes does not provide adequate detection of multiple services and interdependencies of plural software elements of the application platform.
In view of the interdependencies among services performed by the software elements in a network, failure of one software element usually adversely affects many other software elements in a network either directly where there is a direct dependence on the failed software element or indirectly where the dependence is through one or more other software elements. As a result, failure of one software element in a network may have a ripple effect with failure symptoms appearing in many other network software elements. Accordingly, it is a time consuming effort to analyze root cause of failure. Multiple root failures are known to occur at the same time so that it is necessary to perform a time-consuming detailed examination of every node that experienced a failure. Where each node may have many service providing software elements as shown in the application platform arrangement of FIG. 3, the process is exceedingly difficult since each detected software element must be individually checked.