The present invention relates generally to packet-switched computer networks, and specifically to methods and apparatus for testing and diagnosing malfunctions in such networks.
Packet-switched, source-routing computer networks are used in a growing range of applications. Such networks link multiple computer processors, or nodes, via multiple switches. Typically, a packet of data sent from one of the nodes to another passes through a number of different switches. Each switch along the way reads routing information, which is commonly contained in a header of the data packet, and passes the packet on to the next switch along the way, or to the destination node. Typically, there are multiple different paths available through the network over which any given pair of nodes can communicate. An example of this type of network is the well-known Asynchronous Transfer Mode (ATM) network, which is used in communications between separate computers. Such networks are also used in multi- processor computers, such as the RS/6000 Scalable POWERParallel System (SP) series of computers produced by International Business Machines Corporation (Armonk, N.Y.). In the SP computer, as well as in certain other networks, successive packets in a communication stream between the nodes may be sent over different routes.
Because of the complex topology and hardware of packet-switched networks, when a fault occurs in such a network it can be difficult to identify the exact location and nature of the fault. The difficulty is exacerbated by the fact, noted above, that by their nature such networks use multiple different paths between nodes and are fault-tolerant. A network fault will typically appear not as a total breakdown (which would be relatively easy to find), but rather will present more subtle symptoms. For example, there may be a reduction in throughput between some or all of the nodes, or an increase in the number of xe2x80x9cbad packetsxe2x80x9d xe2x80x94data packets whose content is corrupted and must be discardedxe2x80x94at one or more of the nodes.
There are few efficient tools known in the art for diagnosis of such faults. The diagnostic process is time-consuming and heavily reliant on the intuition and experience of a human system administrator (or service engineer) in deciphering and drawing conclusions from the limited information that is available. This information is typically collected in various system files, such as topology files, error logs and trace files, as are known in the art. These files may be recorded at different nodes of the network and must somehow be collated and analyzed by the administrator. Because few network administrators have the know-how to perform this sort of diagnosis, costly service calls are frequently required.
A further problem in diagnosing network faults is non-deterministic failures, which may occur only under certain conditions, and may not arise at all while the diagnostic tests are being performed. Such failures are referred to with terms such as xe2x80x9csporadic,xe2x80x9d xe2x80x9cintermittent,xe2x80x9d xe2x80x9coverheating,xe2x80x9d xe2x80x9clightning,xe2x80x9d xe2x80x9caging,xe2x80x9d or xe2x80x9cstatics,xe2x80x9d which generally mean only that the cause of the problem is unknown. For example, a high-speed switch or adapter may behave normally in light traffic, and break down only under certain particular stress conditions. At times the only way to find such a problem is to systematically bombard each suspect component of the network with packets from different sources, at controlled rates, gradually eliminating components from consideration until the failure is found. Such a process is difficult to automate, and may require that the network be taken off-line for an extended period. The cost of such down-time for prolonged testing and repair can be enormous. There is therefore a need for systematic methods of diagnostic testing, which can be performed while the network is on-line.
There is a similar lack of tools and techniques for systematically testing the response of switch-related network software to hardware fault conditions. Such techniques are needed particularly in software development and testing stages, to ensure that the software responds properly when faults occur. Current methods of testing use specially-designed simulation hardware, such as cables with broken pins, together with debugging clauses that can be activated in the software itself and dedicated debugging fields in associated data structures. The fault situations created by such methods, however, are limited to a small range of scenarios, which are for the most part different from the real hardware faults that occur in actual networks. Similarly, the software used in debugging mode for fault simulation is different from the actual software product that will be used in the field. Moreover, these testing tools are incapable of simulating the type of transient, non-deterministic failures described above. They do not allow errors to be injected and altered on the fly during a simulation.
It is an object of some aspects of the present invention to provide improved methods for fault simulation and diagnostics in packet-switched data networks.
It is still a further object of some aspects of the present invention to provide improved methods and apparatus for identifying a faulty switch adapter, which couples a network node to a switch in the network.
Preferred embodiments of the present invention operate in the context of a packet data network, which comprises a plurality of nodes, or processors, mutually coupled by a plurality of switches, such that typically any one of the nodes can communicate with any other one of the nodes, preferably over multiple links. Each of the nodes is coupled to a respective port of one of the switches by a switch adapter, which performs data link functions, as are known in the art, with respect to each data packet sent or received through the network by the node. One of the nodes is a primary node, which manages the configuration of elements of the network, such as the other nodes and switches in the network.
In preferred embodiments of the present invention, the primary node controls testing and diagnosis of elements of the network in real time, while the network is on-line, or at least with minimal interruption of on-line operation, by appropriately setting parameters of the nodes and switches. The testing preferably includes diagnostic testing to locate suspected faults in the switches and switch adapters. Additionally or alternatively, for the purposes of testing, errors may be intentionally injected into the network so as to simulate the response of the network elements to faults that may occur.
In some preferred embodiments of the present invention, multiple nodes in the network are operated to transmit packets simultaneously at high, predetermined data rates to a destination node, in order to identify a faulty switch adapter, which sends bad (corrupted) packets under certain, unknown conditions. The inventors have found that such switch adapter problems typically appear only when several nodes are transmitting packets at high data rates through the same switch, since in this case the data rate capacity of the switch is exceeded. The switch then tends to back up, forcing the respective switch adapters of the nodes to wait to transmit. Although the normal, properly-functioning switch adapters are capable of synchronizing their transmission to the throughput availability of the switch, the faulty adapter fails to synchronize properly under these conditions and consequently transmits bad packets. The source of the bad packets is detected at the destination, as described hereinbelow, allowing the faulty adapter to be identified.
Preferably, the data packets transmitted by the nodes in such faulty adapter testing contain redundant sender information, so that the faulty adapter can be identified by decoding the bad packets received at the destination node, despite the packets"" corrupted state. Additionally or alternatively, each of the transmitting nodes is controlled to send a predetermined number of packets to the destination node. The packets arriving at the destination node are counted according to the nodes from which they were transmitted, and any shortage in the packets counted is attributed to a fault in the respective adapter.
Further additionally or alternatively, the nodes are controlled to transmit packets to the destination node in systematically selected groups, preferably in groups of three. The nodes in each selected group are chosen, and the routes between the transmitting nodes and the destination node are configured, so that each of the corresponding switch adapters is tested systematically at a number of predetermined data rates. Preferably, the tested data rates include all of the possible data rates that can typically arise when one of the switches along the route between the transmitting and destination nodes is backed up. As noted hereinabove, this is the situation in which adapter faults have been found to arise. If still no bad packets are received for a selected group, other groups are selected and tested in similar fashion until the faulty switch adapter is found. This approach is advantageous by comparison with diagnostic methods known in the art, in that it generally reduces the number of test iterations required in order to find the faulty switch adapter.
Preferably, the primary node assigns the links over which the nodes are to transmit packets to the destination switch by downloading appropriate entries to a route table of the switch adapter of each of the assistant nodes. Typically, in normal network operation, such a route table includes several different routing links over which the node is to communicate, and the switch adapter sends data packets in alternation over the different links in order to balance the data traffic load among different switches and ports in the network. In preferred embodiments of the present invention, however, the entries downloaded to the route tables associated with the nodes indicate that all of the packets are to be sent over the same links, so as to control and maximize the traffic load on the ports of the switch that is intended to back up.
There is therefore provided, in accordance with a preferred embodiment of the present invention, in a computer network system that includes a multiplicity of nodes interconnected by a network of switches, wherein the nodes are linked to the network by respective data link adapters, a method for testing the adapters, including:
selecting one of the nodes to serve as a destination node;
conveying data at a controlled rate from a plurality of the nodes, other than the destination node, through the respective adapters to the destination node; and
detecting an error in the data conveyed from one of the nodes so as to identify a fault in the adapter of that node.
Preferably, conveying the data at the controlled rate includes transmitting data from the plurality of the nodes at a substantially maximal transmission rate that the transmitting nodes can achieve. Most preferably, transmitting the data includes sending data from the plurality of the nodes at an aggregate rate greater than a data throughput capacity of one of the switches in the network through which the data are conveyed, wherein sending the data includes sending data packets, which are queued in the data link adapters of the nodes sending the packets when the aggregate rate is greater than the data throughput capacity of the one of the switches.
Preferably, conveying the data includes conveying data packets, and detecting the error includes detecting a corrupted packet at the destination node. In a preferred embodiment, conveying the data packets includes conveying packets including redundant identification information regarding a source node sending the packets, whereby the source node is identified at the destination node despite the corruption of the packet.
In another preferred embodiment, conveying the data includes conveying data packets, and wherein detecting the error includes finding a discrepancy between a number of packets sent by one of the plurality of the nodes to a number of packets received therefrom by the destination node.
In a further preferred embodiment, conveying the data includes selecting groups of a predetermined number of the nodes and sending data from the nodes in a given one of the groups simultaneously through a selected one of the switches to the destination node. Preferably, the switches have multiple ports, and wherein sending the data includes sending data simultaneously from each of the nodes in the given group through a respective one of the ports of the selected switch. Alternatively or additionally, sending the data includes sending data from one of the nodes in the given group through one of the ports of the selected switch while sending data from the others of the nodes in the given group through another one of the ports of the selected switch.
Preferably, conveying the data includes sending data packets, which in normal operation of the system are routed between any pair of the nodes over a plurality of different routes in alternation, and sending the data packets includes routing substantially all of the packets conveyed from at least one of the plurality of nodes to the destination node over at least one respectively-assigned route. Preferably, each of the data link adapters routes data from the respective node through the network in accordance with a routing table stored in a memory thereof, and routing substantially all of the packets includes downloading a test routing table containing the respectively-assigned route to the adapter of the at least one of the plurality of nodes.
There is moreover provided, in accordance with a preferred embodiment of the present invention, a manageable computer network system, including:
a multiplicity of nodes, including a primary node;
a network of switches, each switch having multiple ports; and
a multiplicity of data link adapters, each linking a respective one of the nodes to one of the ports of one of the switches,
wherein the primary node carries out a diagnostic test of the switch adapters by selecting one of the nodes to serve as a destination node and commanding a plurality of the other nodes to send data at a controlled rate through the respective adapters to the destination node, and wherein the destination node detects an error in the data conveyed from one of the sending nodes so as to identify a fault in the adapter of that node.
Preferably, the data include data packets, and the data link adapters include respective queues, in which the data packets accumulate during the diagnostic test, wherein the error detected by the destination node includes corruption of a packet.
In a preferred embodiment, the plurality of the other nodes commanded to send data includes a group of a predetermined number of the nodes, which send data simultaneously through a single switch to the destination node.
There is further provided, in accordance with a preferred embodiment of the present invention, a computer software product for testing data link adapters respectively linking a multiplicity of processor nodes, one of which nodes is designated a primary node, to switches in a computer network system, the product including computer-readable code, which is read by the primary node, causing the primary node to select one of the nodes to serve as a destination node, and to command a plurality of the nodes, other than the destination node, to convey data through the respective adapters to the destination node and to detect an error in the data conveyed from one of the nodes so as to identify a fault in the adapter of that one of the nodes.