1. Field of the Invention
This invention relates generally to computer networks, and, more particularly, to network reachability problems.
2. Background Information
Data communication in a computer network involves the exchange of data between two or more entities interconnected by communication links, segments and sub-networks. These entities are typically software processes executing on hardware computer platforms, such as end nodes and intermediate nodes. Communication software executing on the end nodes correlate and manage data communication with other end nodes. The nodes typically communicate by exchanging discrete frames or packets of data according to predefined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP).
An intermediate node, such as a router, may interconnect the sub-networks to extend the effective “size” of the computer network. The router executes routing protocols used to direct the transmission of data traffic between the end nodes, such as hosts or users. Typically, the router directs network traffic based on destination address prefixes contained in the packets, i.e., the portions of destination addresses used by the routing protocol to render routing (“next hop”) decisions. Examples of such destination addresses include Internet Protocol (IP) version 4 and version 6 addresses. A prefix implies a combination of an IP address and a mask that cooperate to describe an area or range (of addresses) of the network that a router can reach, whereas a route implies a combination of a set of path attributes and a prefix. Much of the following detailed discussion refers to routers that are specific examples of “nodes,” but the present invention applies wherever routing occurs.
The infrastructure of a router typically comprises functional components organized as a control plane and a data plane. The control plane includes the components needed to manage the traffic forwarding features of the router. These components include routing protocols, configuration information and other similar functions that determine the destinations of data packets based on information other than that contained within the packets. The data plane, on the other hand, includes functional components needed to perform forwarding operations for the packets.
For a single processor router, the control and data planes are typically implemented on the single processor. However, for some high performance routers, these planes are implemented within separate devices of the intermediate node. For example, the control plane may be implemented in a supervisor processor, whereas the data plane may be implemented within a hardware-assist device, such as a co-processor or a forwarding processor. In other words, the data plane is typically implemented in hardware that is separate from the hardware that implements the control plane.
FIG. 1 is a prior art model of two autonomous systems AS1 and AS2 communicating with each other via routers R1 and R2 using Border Gateway Protocol (BGP). BGP is also know as Exterior Gateway Protocol (EGP). R1 and R2 are “peer” routers, sometimes referred to as “edge” or “border” routers, since they communicate with each other between the autonomous systems. AS1 and AS2 within their autonomous networks may connect to many intermediate network nodes, e.g., end users, hosts, etc. and have many other internal intermediate routers (2, 2′, 4, 4′) connecting to networks and sub-networks, hosts, etc. Within the autonomous systems, the border routers may communicate via Interior Gateway Protocols (IGP's) as known in the art—R1 and R2 will run both BGP and IGP.
Autonomous systems are network structures, usually complex, that all fall under one administrative authority, such as a company, a branch of government, or an academic institution. In this way the administrative authority can guarantee that the internal routes remain consistent and viable. Typically, as known in the art, the administrative authority will designate one router, a border router, to advertise or inform the outside world of the autonomous system's “reachability”—those destination network addresses entirely within the autonomous system.
It is axiomatic that the reachability information carried by the protocols through the network reflect the current state of the last hop router, and the reachability information is propagated in such a way that the receiving routers will know how to reach the last hop router—the router that directly connects to a destination address. In the present discussion below a 32 bit IP version 4 address is used but other addressing schemes may be used with the present invention.
Unfortunately the routing protocols stored in the routing nodes may not have the correct information. This can be due to hardware/software/firmware problems along the network path to and including the last hop router. It can also be due to human error and to incorrect or bad policies along the network.
Hardware/software problems are straight forward. Examples are a fiber link that is marginally conducting light, some integrated circuit or connector malfunction, corrupted software, operating timeouts, looping, excessive delays, insufficient memory, etc. One very real practical example is a Denial Of Service (DOS) attack. In such an attack the routing information and the links operate properly but packets cannot be forwarded or are forwarded with significant delays all due to the link being saturated with traffic. Packets are routinely dropped although the system is functioning.
Human error may be an administrator or other user redistributing a static route into a routing protocol that points to a non-existing destination.
Policy problems include, for example, an Internet Service Provider (ISP) that limits or otherwise restricts inbound traffic to its customers. For example, an ISP may only allow a subscriber one hour of access to some information, and when the limit is reached the ISP will stop routing the traffic to the subscriber. In this case the routing and network paths are intact but the information is not reaching its intended destination. In other cases, an entity may restrict access to prefix areas containing confidential or important data.
In the present application the term “brown-out” refers to a sub-set of prefixes advertised by a router that are not reliably reachable. A “black-out” is the limiting case where all of the prefixes are unreachable.
When a “brown-out” occurs, often a customer will call his ISP and complain. A service person at the ISP will then, at a console, manually try to find some alternative route and/or identify the suspect addresses and the type of problem. Typically, the person “pings” or tries to interrogate specific addresses to see if they are reachable until he finds those that are unreachable. “Pinging” might denote trying to establish a TCP session with the suspect addresses. Such a scenario is a problem that service providers would wish to not happen. First, the customer is not happy, and, second, the manual detection and the correction may take some time.
The present invention is directed to automatically detecting brown-outs and notifying an administrator or some controlling entity of a specific problem at specific addresses. In the best case, the problem is corrected before any customer complains and without manual intervention to detect the problem locations. The administrator will then take corrective action. The efficiency of the system and public relations are improved—the customer may not be aware of any problem.