1. Field
The present invention relates to network failure detection.
2. Description of Related Art
In an Internet Protocol (IP) network and the like, a so-called broadcast (hereinafter, BC) storm failure, in which a large number of broadcast frames are generated because of some failure and accordingly communication becomes impossible in an entire network, is an operational problem in an IT system.
For example, as shown in FIG. 1, a failure in which a BC storm brings an Internet Service Provider (ISP) network down and the like may occur because of a loop connection made in a layer 2 (L2) switch (hereinafter, referred to as an “SW”) in end user premises.
In order to restore the network to its normal state, it is required to quickly specify an originator of the BC and disconnect the same from the network.
Originator tracing of the BC storm is realized by tracing an interface (hereinafter, referred to as an “IF”) that receives a large amount of BC frames in each SW. Hereinafter, this is described with reference to FIG. 2 showing basic procedures thereof.
P1-1
A network monitoring device collects physical connection information (physical topology information) of the network. For example, the network monitoring device collects/grasps the physical topology of the network in advance by an existing connection information collecting method, such as the Link Layer Discovery Protocol (LLDP) set forth in the IEEE802.1AB or manual entry by an administrator or the like. Although following procedures P1-2 to P1-5 are periodically repeated, this procedure P1-1 only needs to be executed once unless there is a change in the physical topology.
P1-2
The network monitoring device periodically collects a transmission and reception flow rate of the BC from each SW in the network. The transmission and reception flow rate of the BC is counted based on the industry standard Management Information Base (MIB) for each IF that the SW has. The network monitoring device collects a transmission and reception flow rate number from each SW by the standardized Simple Network Management Protocol (SNMP).
P1-3
The network monitoring device detects the SW and the IF thereof, which transmit/receive the BC exceeding a given amount. For example, excess of the given amount is detected by a threshold set in the network monitoring device in advance.
P1-4
The network monitoring device maps the detected IF, which transmits/receives a large amount of BC, to the physical topology created in the procedure P1-1.
P1-5
The network monitoring device traces the IF that receives a large amount of BC. In the example shown in FIG. 2, it is specified that a suspected spot of the originator is under a most downstream SW.
In the procedure P1-1, the physical topology is managed by a table such as a “physical topology management table” as shown in FIG. 3, for example, which the network monitoring device has.
In FIG. 3, the items on the table refer to the following:
SW name: A unique identifier (such as an IP address) of an SW is entered.
IF: An IF number of the above-described SW is entered.
Adjacent SW name: A unique identifier (such as an IP address) of a SW connected to the above-described IF, that is to say, adjacent thereto, is entered.
Adjacent IF: The IF number of the SW connected to the above-described IF, that is to say, adjacent thereto, is entered.
An example of management of the physical topology using the physical topology management table is shown in FIG. 4. The IF names in each SW are alphabetically named from “a” in a clockwise fashion.
Also, in a procedure P1-3 in FIG. 2, the SW and the IF thereof, which transmit/receive the BC exceeding the given amount, are managed by a flow rate monitoring table, as shown in FIG. 5, for example, which the network monitoring device has. In FIG. 5, the items on the table refer to the following:
SW name: This is the same as that in FIG. 3.
IF: This is the same as that in FIG. 3.
Reception amount superthreshold: This is marked when the number of BC frames received by the IF is larger than the threshold.
Transmission amount superthreshold: This is marked when the number of BC frames transmitted by the IF is larger than the threshold.
The above description is based on an assumption that the network monitoring device may collect the MIB of all the SWs under the network monitoring device in the procedure P1-2, and as a result, a source is specified by tracing an IF that receives a large amount. However, when the BC storm is generated, the SW, for which flow rate information cannot be collected, appears constantly or intermittently due to a CPU overload of the SW and the like. That is to say, after the network monitoring device transmits an SNMP request to each network device, the network monitoring device does not receive an SNMP response within a given time period and time-out occurs. As a result, in the procedure P1-3, a part of SWs may not be mapped to the physical topology, and a problem that the tracing gets stuck halfway occurs.
In order to solve this, conventionally, the source is traced while presuming the flow in the SW for which collection of the MIB has failed. Hereinafter, the tracing technique is illustrated. Herein, as shown in FIG. 6, it is assumed that there are a total of two sources of the BC storms beyond SW-1 and SW-6. In this case, the source tracing method conducted by the network monitoring device in a case in which the collection of the MIB is failed at SW-2, SW-3, SW-4 and SW-5 is shown in FIGS. 7 to 10.
P2-1 (FIG. 7)
An arbitrary IF of an arbitrary SW, for which the MIB is collected, being an IF in which the transmission amount superthreshold occurs is selected. Herein, it is assumed that IF-a of SW-6 is selected. Next, an IF in which the reception amount superthreshold occurs is searched in this SW. Herein, a search result is IF-b, and it is determined that the BC storm flows from IF-b to IF-a.
P2-2 (FIG. 8)
An adjacent SW and an adjacent IF of IF-b of SW-6 are searched for in the physical topology management table, and IF-a of SW-5 is obtained. Since SW-5 is an SW for which the collection of the MIB is failed, nothing is described for SW-5 in the flow rate monitoring table. However, since IF-b of SW-6 is in the state of the reception amount superthreshold, it can be determined that IF-a of SW-5 is in the state of the transmission amount superthreshold. Next, another IF in the SW is searched for, and it is assumed that this IF is presumed to be in the state of the reception amount superthreshold. Herein, it is assumed that IF-b is presumed to be in the state of the reception amount superthreshold. A pattern in which IF-c is presumed to be in the state of the reception superthreshold is described in P2-5 in FIG. 9.
P2-3 (FIG. 9)
The presumption similar to that in P2-2 is repeated to reach IF-b of SW-6, for which the MIB is collected. That is to say, a trace result of “SW-6→SW-5→SW-3→SW-2→SW-4→SW-5→SW-6” is obtained.
P2-4 (FIG. 9)
In P2-3, IF-a is presumed to be the reception amount superthreshold IF in SW-2. In P2-4, IF-c of SW-2 is presumed to be a reception amount superthreshold IF. The presumption is similarly repeated, and the trace result of “SW-6→SW-5→SW-3→SW-2→SW-1” is obtained.
P2-5 (FIG. 9)
In P2-3, IF-b is presumed to be the reception amount superthreshold IF in SW-5. In P2-5, IF-c of SW-5 is presumed to be a reception amount superthreshold IF. The presumption is similarly repeated, and the trace result of “SW-6→SW-5→SW-4→SW-2→SW-3→SW-5→SW-6” is obtained.
P2-6 (FIG. 9)
In P2-5, IF-b is presumed to be the reception amount superthreshold IF in SW-2. In P2-6, IF-c of SW-2 is presumed to be a reception amount superthreshold IF. The presumption is similarly repeated, and the trace result of “SW-6→SW-5→SW-4→SW-2→SW-1” is obtained.
P2-7 (FIG. 9)
In P2-1 to P2-6, the tracing is started from SW-6. The tracing similar to that of P2-1 to P2-6 is performed starting from SW-1. P2-8 to P2-12 are similar to P2-2 to P2-6, and thus the description thereof is not repeated.
P2-13 (FIG. 10)
The BC storms from two sources flow in a group of SWs, for which the collection of the MIB is failed, in any of combinations of flows extracted in P2-1 to P2-6 and P2-7 to P2-12, that is to say, in any of patterns 1 to 4 shown in FIG. 10. The network monitoring device selects any one of them and continues the tracing until it reaches the sources.
In addition to the above-described related art, a technique of the patent document: Japanese Laid-Open Patent Application Publication No. 2003-318985, for example, is disclosed. The object of the related art is to short-circuit to remove a configuration of branching and merging from packet route information, and a control frame based on an individual protocol is communicated between the SWs and a portion closed by branching and merging is detected to be represented as one link.
However, since the presumption of the transmission and reception of the IF is performed while tracing in the related art described with reference to FIGS. 7 to 10, all the patterns of combinations of the flows, which may flow in the group of SWs, for which the collection is failed, are extracted, and the sources are traced. For example, in FIGS. 7 to 10, 4×2=8 flows are extracted and traced as follows:
4=the number of combinations of branchings in the group of SWs, for which the collection of the flow rate information is failed (2×2).
2=the number of BC flows actually flowing (=the number of sources).
(BC Flow 1 and BC Flow 2 in FIG. 6)
To presume a way of flow in the group of SWs, for which the collection is failed, is one method in the process of the tracing, and the original object thereof is to quickly specify the source. Since an actual network topology is much more complicated than that described with reference to FIGS. 7 to 10, in a method in which all the flows are traced as in the above-described related art, a calculation amount is enormous, and this increases calculation time and a process load in the network monitoring device.
Moreover, the related art disclosed in the above-described patent document is the technique to short-circuit to remove a closed section, and the collection and simplification of the topology information are simultaneously performed. Therefore, the individual protocol is required.