1. Field of the Invention
The present invention relates generally to computer software and network applications. Specifically, the present invention relates to computer network testing software for detecting faults in network devices.
2. Discussion of Related Art
In a computer network messages are typically sent from a source to a receiver using a unicast message routing protocol. Unicast message delivery involves sending a message from a source having a distinct IP (Internet Protocol) address to a single receiver also having a distinct IP address. Unicast routing protocols are also used for when a source is sending a message to multiple receivers. In this situation, multiple unicast messages are sent from the source to each individual receiver where each receiver has an IP address.
Recently, message routing protocols referred to as multicast routing are being used to route messages in computer networks. In unicast routing a source sends a message only to a receiver. With multicast routing a source sends a single message to a group that includes individual receivers. The source sends the message to a group IP address which corresponds to all the IP addresses of the individual group members. For example, group A can include five clients in a computer network and a sender can send a message to each of the five members by sending one message to the group IP address (a client can belong to a group by subscribing to the group IP address). The message is then propagated to each individual client. Multicast routing protocols are described in more detail in xe2x80x9cMulticast Routing in a Datagram Internetworkxe2x80x9d by Stephen Deering, PhD Thesis, Stanford University, 1991 and xe2x80x9cThe PIM Architecture for Wide-Area Multicast Routingxe2x80x9d by Stephen Deering, et. al. IEEE/ACM, Transaction on Networking, April 1996, Vol. 4, No. 2. which are incorporated herein by reference. Multicast routing protocols have recently emerged from their developmental stage and are now increasingly prevalent in computer networks as a technique for routing messages. However, management tools specifically tailored for such protocols are just recently being developed and are essential for the continued growth of multicast routing methods.
A problem with large multicast routing infrastructures is the near real-time detection and isolation of problems with network components, or more specifically, the detection of faults in devices such as routers and switches. Existing tools for managing, in a systematic way, multicast routing infrastructures are inefficient and inconvenient, particularly across large routing systems (individual networks or domains can be connected to form a large multicast infrastructure). The most common tool for isolating faults in a multicast infrastructure is MTRACE, used to isolate faults or problems with network devices. Presently, no tools exist for automated multicast fault detection. MTRACE is a non-proprietary software program and technique for isolating (although not detecting) a fault. Its use is described in more detail with reference to FIG. 1.
FIG. 1 is an illustration showing typical components in a computer network configuration. It includes client terminals connected to edge routers which, in turn, are connected to transit routers for receiving and forwarding data packets. A router is one example of a packet manipulation device. It can also collect statistics on data packets that it receives and forwards. FIG. 1 shows three client terminals 103, 105, and 107 within a single domain 101. Also shown are two neighboring domains 109 and 111 which can be linked to domain 101 to form a large multicast configuration, in which domains 101, 109 and 111 are part of the network topology. Terminal 103 is connected to an edge router 113. Similarly, terminal 105 is connected to edge router 115 and terminal 107 is connected to edge router 117. Located between the edge routers are transit routers 119, 121, and 123. Transit routers are used to receive and forward data packets between edge routers in a typical network configuration.
MTRACE is used to isolate faults that occur in devices such as edge routers and transit routers, in multicast infrastructures. Typically, a network operator receives a call from a user indicating that a problem has occurred, such as receiving an incomplete message. The network operator must first determine who is the source of the message and the group to which the user belongs. MTRACE does not provide real-time alerting capability in a multicast infrastructure. In addition, a network operator using MTRACE to isolate a problem must be familiar with the multicast protocol. Some of the typical problems that can occur when a device is not functioning properly are 1) a data packet is not received at all by a device or an intended receiver of a message, 2) there is a implementation bug in the software, 3) there is congestion in the network e.g., packets are sent faster than they can be received, 4) there is a misconfiguration of the network topology, or 5) there is unnecessary duplication of data packets occurring in the devices.
After the path has been traced, the network operator examines the MTRACE data, which itself is rather cryptic, to determine which device is causing the fault. MTRACE is used to determine the path of a data packet from the source to the receiver. Using MTRACE to locate a problem requires a significant amount of time. For example, if edge router 113 did not receive a data packet, MTRACE is used to check all the routers between router 113 and the source (e.g., router 117). The device is not isolated until the entire path between the receiver and the source is evaluated. Each device maintains statistics which are read by MTRACE. The statistics include packet counts and a state of the device. Once the source router 117 is reached, the MTRACE output is examined and the problematic device is pinpointed. However, it does not perform real-time detection of faults.
Therefore, it would be desirable to have a multicast routing management tool that allows for near real-time fault detection, i.e. a fault alarm without relying on customer phone calls, that can also provide a more systematic way to get up-to-date multicast routing status reports. In addition, it would be desirable for network operators to have a method of testing in advance, a multicast routing configuration to insure that there are no problems with devices in, for example, the paths necessary to reach a critical group of receivers.
To achieve the foregoing, and in accordance with the purpose of the present invention, methods, systems, and computer-readable media for detecting faults in data packet routing devices in a computer network capable of routing messages using a multicast protocol are described. In a preferred embodiment of one aspect of the invention, a method of detecting faults in a multicast routing infrastructure in near real-time includes configuring a device to be a sender or source of test data packets and one or more other devices to receive test data packets. The test packet sender transmits test data packets to a test group of test packet receivers where the test group has a group identifier. The test receivers prepare data or fault reports describing errors regarding missing or duplicated data packets. These fault reports are prepared soon after the errors are detected.
In another preferred embodiment, a device in the network is configured to be a multicast routing monitor that initiates desired multicast routing tests. Fault reports are sent from the test receivers to the multicast routing monitor in a time-dependent manner to prevent overloading the network and the routing monitor with fault report traffic. In yet another preferred embodiment, the test sender is configured by sending it a source configuration request and a test receiver is configured by sending it a receiver configuration request. A receiver configuration request includes a test group identifier indicating that the test packet receiver belongs to a particular test group.
In a preferred embodiment of another aspect of the invention, a system for detecting faults in near real-time in a multicast routing infrastructure includes a test monitoring device, one or more test source devices, and multiple test receiving devices. The test monitoring device transmits test configuration data packets to potential test senders and test receivers, and collects fault information from test receivers. The test senders are configured to transmit test data packets to multiple test receivers. The test receivers are configured to receive test packets from the test senders and transmit fault data to the test monitoring device.
In a preferred embodiment, the system includes test source request packets and test receiver request packets created and transmitted by the test monitoring device and containing an identifier corresponding to the test monitoring device. In yet another preferred embodiment, the test receiver request packets and the test source request packets contain a test group identifier for identifying a group of test packet receiving devices. In yet another preferred embodiment, a test receiver request packet contains data relating to time intervals in which the test packet receiving device transmits fault data to the test monitoring device. In yet another preferred embodiment, a test receiver request packet contains criteria on when a fault in the multicast routing infrastructure has occurred.
In another aspect of the invention, a format of a test sender request message for configuring a network device to transmit test data packets using a multicast routing protocol is described. In a preferred embodiment, a test sender request message includes an originator identifier field, a target identifier field, and a test group identifier field. The originator identifier field contains an identifier, such as an Internet Protocol address, corresponding to a multicast routing manager device. The target identifier field contains an identifier corresponding to a test sender device. The test group identifier field contains an identifier corresponding to a group of test receivers that includes one or more test receivers. In another preferred embodiment, the test sender request message includes a packet delay field used in the emission of data packets from the sender data packet routing device.
In another aspect of the invention, a format of a test receiver request message for configuring a network device to receive test data packets using a multicast routing protocol is described. In a preferred embodiment, a test receiver request message includes an originator identifier field, a test group identifier field, and a test sender identifier field. The originator identifier field contains an identifier, such as an Internet Protocol address, corresponding to a multicast routing manager device. The test group identifier field contains an identifier corresponding to a group of test receivers. The test sender identifier field contains an identifier corresponding to a test sender that will be sending the test receiver test data packets. In another preferred embodiment, the format of a test receiver request message includes one or more fault data transmission fields containing data relating to when fault data should be transmitted to the multicast routing manager device.