1. Field of the Invention
The present invention relates generally to computer software and network applications. Specifically, the present invention relates to computer network testing software for detecting faults in network devices.
2. Discussion of Related Art
In a computer network messages are typically sent from a source to a receiver using a unicast message routing protocol. Unicast message delivery involves sending a message from a source having a distinct IP (Internet Protocol) address to a single receiver also having a distinct IP address. Unicast routing protocols are also used for when a source is sending a message to multiple receivers. In this situation, multiple unicast messages are sent from the source to each individual receiver where each receiver has an IP address.
Recently, message routing protocols referred to as multicast routing are being used to route messages in computer networks. In unicast routing a source sends a message only to a receiver. With multicast routing a source sends a single message to a group that includes individual receivers. The source sends the message to a group IP address which corresponds to all the IP addresses of the individual group members. For example, group A can include five clients in a computer network and a sender can send a message to each of the five members by sending one message to the group IP address (a client can belong to a group by subscribing to the group IP address). The message is then propagated to each individual client. Multicast routing protocols are described in more detail in “Multicast Routing in a Datagram Internetwork” by Stephen Deering, PhD Thesis, Stanford University, 1991 and “The PIM Architecture for Wide-Area Multicast Routing” by Stephen Deering, et. al. IEEE/ACM, Transaction on Networking, April 1996, Vol. 4, No. 2. which are incorporated herein by reference. Multicast routing protocols have recently emerged from their developmental stage and are now increasingly prevalent in computer networks as a technique for routing messages. However, management tools specifically tailored for such protocols are just recently being developed and are essential for the continued growth of multicast routing methods.
A problem with large multicast routing infrastructures is the near real-time detection and isolation of problems with network components, or more specifically, the detection of faults in devices such as routers and switches. Existing tools for managing, in a systematic way, multicast routing infrastructures are inefficient and inconvenient, particularly across large routing systems (individual networks or domains can be connected to form a large multicast infrastructure). The most common tool for isolating faults in a multicast infrastructure is MTRACE, used to isolate faults or problems with network devices. Presently, no tools exist for automated multicast fault detection. MTRACE is a non-proprietary software program and technique for isolating (although not detecting) a fault. Its use is described in more detail with reference to FIG. 1.
FIG. 1 is an illustration showing typical components in a computer network configuration. It includes client terminals connected to edge routers which, in turn, are connected to transit routers for receiving and forwarding data packets. A router is one example of a packet manipulation device. It can also collect statistics on data packets that it receives and forwards. FIG. 1 shows three client terminals 103, 105, and 107 within a single domain 101. Also shown are two neighboring domains 109 and 111 which can be linked to domain 101 to form a large multicast configuration, in which domains 101, 109 and 111 are part of the network topology. Terminal 103 is connected to an edge router 113. Similarly, terminal 105 is connected to edge router 115 and terminal 107 is connected to edge router 117. Located between the edge routers are transit routers 119, 121, and 123. Transit routers are used to receive and forward data packets between edge routers in a typical network configuration.
MTRACE is used to isolate faults that occur in devices such as edge routers and transit routers, in multicast infrastructures. Typically, a network operator receives a call from a user indicating that a problem has occurred, such as receiving an incomplete message. The network operator must first determine who is the source of the message and the group to which the user belongs. MTRACE does not provide real-time alerting capability in a multicast infrastructure. In addition, a network operator using MTRACE to isolate a problem must be familiar with the multicast protocol. Some of the typical problems that can occur when a device is not functioning properly are 1) a data packet is not received at all by a device or an intended receiver of a message, 2) there is a implementation bug in the software, 3) there is congestion in the network e.g., packets are sent faster than they can be received, 4) there is a misconfiguration of the network topology, or 5) there is unnecessary duplication of data packets occurring in the devices.
After the path has been traced, the network operator examines the MTRACE data, which itself is rather cryptic, to determine which device is causing the fault. MTRACE is used to determine the path of a data packet from the source to the receiver. Using MTRACE to locate a problem requires a significant amount of time. For example, if edge router 113 did not receive a data packet, MTRACE is used to check all the routers between router 113 and the source (e.g., router 117). The device is not isolated until the entire path between the receiver and the source is evaluated. Each device maintains statistics which are read by MTRACE. The statistics include packet counts and a state of the device. Once the source router 117 is reached, the MTRACE output is examined and the problematic device is pinpointed. However, it does not perform real-time detection of faults.
Therefore, it would be desirable to have a multicast routing management tool that allows for near real-time fault detection, i.e. a fault alarm without relying on customer phone calls, that can also provide a more systematic way to get up-to-date multicast routing status reports. In addition, it would be desirable for network operators to have a method of testing in advance, a multicast routing configuration to insure that there are no problems with devices in, for example, the paths necessary to reach a critical group of receivers.