There has been known a system in which a plurality of communication apparatuses are connected to constitute a communication network.
FIG. 1 shows a block diagram of the system in which a plurality of communication apparatuses are connected to constitute a communication network.
The communication network of FIG. 1 includes: an upper communication apparatus 200; communication apparatuses 201 to 203 which are connected to the communication apparatus 200; communication terminals 204 and 205 which are connected to the communication apparatuses 201 to 203; and a network management system 206 which is connected to the communication apparatus 200.
The upper communication apparatus 200 relays data to/from an external network.
The communication apparatuses 201 to 203 are each connected to any one of the communication terminals 204 and 205, and are controlled in operation by the communication apparatus 200. FIG. 1 shows the case where the communication apparatuses 201 and 203 are connected to the communication terminals 204 and 205, respectively.
The communication terminals 204 and 205 are each connected to any one of the communication apparatuses 201 to 203 through a communication medium.
The network management system 206 is connected to the communication apparatus 200, and manages the operation status of the communication network.
Take a mobile communication system as a concrete example. A base station control apparatus corresponds to the communication apparatus 200. Wireless base stations correspond to the communication apparatuses 201 to 203. Mobile stations correspond to the communication terminals 204 and 205.
If a fault occurs between mutually-opposed communication apparatuses in the network of FIG. 1, such as between the communication apparatuses 200 and 201, a message for notifying of the fault is transmitted from the communication apparatus 200 or 201 to the network management system 206.
The network management system 206 is monitored by a maintenance person. When the network management system 206 receives the fault notification message, the maintenance person analyzes the message and takes specific measures for recovery based on the result of analysis.
Patent Document 1 describes an example of a system that analyzes such a fault notification message to estimate the cause of a failure occurring in a communication network.
The failure cause estimation system described in Patent Document 1 analyzes the pattern of occurrence of the fault notification message, estimates the failure cause according to predetermined estimation rules, and automatically takes countermeasures.
With the recent sophistication of communication apparatuses, however, it has become difficult to provide in advance an exhaustive set of such fault notification messages for all faults that can occur in a communication network.
There has thus been the problem that if there occurs a fault that is not previously expected to be notified of or if there occurs a fault in the fault-notifying function itself, the fault fails to be detected and the failure of the communication network tends to last long.
In such cases where a fault notification message is not appropriately output despite the presence of a serious communication failure such as quality degradation in the communication network, a method is used to analyze process logs retained in the communication apparatuses to detect the communication failure and identify the failure cause.
Since the process logs contain more detailed information on the internal processing of the apparatuses than fault notification messages do, it is sometimes possible to detect a communication failure that is not detectable by means of the fault notification messages and estimate the cause of the communication failure.
An example of the process logs retained in the communication apparatuses is described in Patent Document 2.
The process log described in Patent Document 2 is generally referred to as call processing alert log, which contain information such as the location of processing where an abnormal disconnection occurs in the middle of call processing inside a communication apparatus and the reason of occurrence of the abnormal disconnection.
Examples of the reason of occurrence of an abnormal disconnection include a timeout in standby processing, the occurrence of congestion, the occurrence of call admission control, an insufficient communication band, and loss of a terminal.
Generally, the call processing alert log is accumulated in a recording apparatus provided in the communication apparatus 200 or the network management system 206 as a time-series log that is accompanied with such information as the date and time of occurrence and communication nodes involved in an abnormal disconnection.
Such process logs in the communication apparatuses may be output to an external network management system beforehand in preparation for the occurrence of a failure, whereas the process logs are usually not output to exterior but acquired upon the occurrence of a failure if necessary.
Patent Document 3 describes an example of a system that detects a failure in a communication network by analyzing logs that record abnormal processes, like a call processing alert log, among such process logs retained in communication apparatuses.
FIG. 2 is a block diagram showing the configuration of a failure detection system of a communication network that is described in Patent Document 3.
The failure detection system 207 shown in FIG. 2 is described for the case where the failure detection system 207 is connected to the network management system 206, for example. The failure detection system shown in FIG. 2 includes a log collecting unit 100, an observation amount extracting unit 101, a failure feature extracting unit 102, a failure feature appearance intensity calculating unit 103, an appearance intensity probability distribution calculating unit 104, a network characteristic DB (database) 105, an abnormality calculating unit 106, a failure detecting unit 107, a result display unit 108, and an input unit 109.
The log collecting unit 100 collects process logs that are accumulated in the network management system 206.
The observation amount extracting unit 101 extracts observation amount necessary for monitoring the network status from the collected logs.
The failure feature extracting unit 102 extracts failure features from the observation amount that is extracted by the observation amount extracting unit 101.
The failure feature appearance intensity calculating unit 103 calculates the appearance intensities of the failure features from the observation amount of the observation amount extracting unit 101.
The appearance intensity probability distribution calculating unit 104 calculates a probability distribution at normal time from the failure feature appearance intensity calculating unit 103.
The network characteristic DB 105 stores the probability distribution at normal time calculated by the appearance intensity probability distribution calculating unit 104 and the failure features calculated by the failure feature extracting unit 102.
The abnormality calculating unit 106 compares the magnitudes of the appearance intensities calculated by the failure feature appearance intensity calculating unit 103 and the probability distribution of the appearance intensities of the failure features at normal time stored in the network characteristic DB 105 to calculate the degrees (abnormalities) how the appearance intensities are abnormal.
The abnormality calculating unit 106 also integrates the abnormalities of a plurality of failure features to calculate the abnormality of a communication node.
The failure detecting unit 107 compares the abnormality of the communication node and an abnormality threshold stored in the network characteristic DB 105, thereby judging the state of the communication node to detect a failure.
The result display unit 108 displays the result of failure detection on a display device such as a CRT (Cathode Ray Tube).
The observation amount that the observation amount extracting unit 101 extracts the logs from the log collecting unit 100 are multidimensional vectors. The observation amount extracting unit 101 extracts processes pertaining to a certain communication node from the logs, and determines the numbers of occurrence of respective types of processes extracted per unit time as respective vector elements.
The failure features that the failure feature extracting unit 102 extracts from the observation amount are multidimensional vectors. The multidimensional vectors are statistically or empirically extracted from the observation amount, and include variation components that are statistically uncorrelated, variation components that are statistically independent, and variation components that are statistically neither fully uncorrelated nor independent but are empirically known to be related to failure causes.
Examples of the failure causes include the appearance of an interference signal, a temporary sharp increase in the number of communication users, the interruption of a communication channel, and a breakdown of a communication apparatus.
FIG. 3 is a configuration diagram showing the configuration of information that is stored in the network characteristic DB 105.
The network characteristic DB 105 contains parameters that indicate the characteristics of each of communication nodes 1 to J (J is a natural number) to be monitored.
The characteristic parameters of a communication node include: failure features 1 to N (N is a natural number) extracted from the logs (statistical features of the logs upon the occurrence of a failure); the probability distributions of the appearance intensities of the statistics at normal time; and an abnormal threshold intended for failure detection.
Next, the operation of the failure detection system of a communication network described in Patent Document 3 will be described in detail with reference to FIGS. 4 and 5.
FIG. 4 is a flowchart for explaining the operation of the failure detection system of a communication network that is described in Patent Document 3.
In FIG. 4, the operation is started at step 300. The observation amount extracting unit 101 then extracts the numbers of occurrence of processes occurring in the communication nodes to be monitored per unit time from the logs collected by the log collecting unit 100. Multidimensional vectors that contain those values as elements are assumed to be observation amount (step S301).
Here, the communication nodes to be monitored for a failure and the time range are specified by a user through the input unit 109.
Now, if the network characteristic DB 105 is not constructed yet, a determination to update the network characteristic DB 105 is made at step S302, so that the network characteristic DB 105 is constructed at step S303 prior to the monitoring of the communication network for a failure.
FIG. 5 is a flowchart for explaining the operation of the processing for constructing the network characteristic DB 105 at step S303.
Initially, the construction (update) of the network characteristic DB is disclosed at step S400. At step S401, a set of samples is created to include both normal samples and failure samples, with observation amount obtained from the communication nodes to be monitored (communication nodes 1 to J) in each unit time as the samples.
Next, at step S402, statistical features of failures are extracted from the set of samples and stored in the network characteristic DB 105.
Then, at step S403, samples of observation amount that are obtained when the communication nodes 1 to J to be monitored are in a normal state are extracted from the set of samples.
At step S404, the appearance intensities of the failure features are calculated from the respective samples extracted at step S403.
Then, at step S405, the probability distributions of the appearance intensities are calculated from the set of appearance intensities of the failure features created at step S403, and stored in the network characteristic DB 105.
At step S406, samples of observation amount that are obtained when the communication nodes 1 to J to be monitored are in a failure state are extracted from the set of samples.
At step S407, the appearance intensities of the failure features are calculated from the respective samples extracted at step S406.
Then, at step S408, the abnormalities of the appearance intensities of the failure features are integrated to determine the abnormalities of the communication nodes 1 to J.
At step S409, an abnormality threshold which is determined based on the distribution of the abnormalities of the communication nodes 1 to J at failure time or based on operation policy is stored in the network characteristic DB 105.
In this way, the network characteristic DB 105 can be updated by the processing of constructing a network characteristic DB according to the flowchart shown in FIG. 5.
Returning to step S303 of FIG. 4, the failure detection system of a communication network described in Patent Document 3 detects a failure of a communication node by using the network characteristic DB 105 constructed as described above.
Specifically, at step S304, the appearance intensities of the failure features stored in the network characteristic DB 105 are calculated from the observation amount.
At step S305, the abnormalities of the communication nodes are determined from the probability distributions stored in the network characteristic DB 105.
At step S306, the abnormalities of the communication nodes to be monitored and the threshold stored in the network characteristic DB 105 are compared to judge the presence or absence of a failure.
In the foregoing operation, the abnormalities of the appearance intensities of the failure features are set in terms of any of upper probabilities, lower probabilities, and two-sided probabilities of the appearance intensities that are determined from the probability distributions stored in the network characteristic DB 105. The abnormalities of the communication nodes are determined as the products of the abnormalities of the appearance intensities that are determined of the respective failure features.
The failure detection system of a communication network described in Patent Document 3 thereby achieves the detection of failures in the communication network, using the process logs retained in the apparatuses.    Patent Document 1: JP-A-2004-80297    Patent Document 2: JP-A-11-261471    Patent Document 3: JP-A-2007-020115    Non-Patent Document 1: Aapo Hyvarinen et al., with two translators, “Independent component analysis”, Tokyo Denki University Press, Feb. 10, 2005, pp. 164-217    Non-Patent Document 2: Richard O. Duda et. al, with a supervisor-translator, “Pattern classification”, New Technology Communications, Jul. 3, 2001, pp. 32-36, pp. 528-529