A distributed system is widely applied in existing networks. For example, a content delivery network (CDN) system is a typical distributed system. A distributed system consists of many network nodes. In actual applications, each network node in the distributed system needs to be monitored, for timely discovery of nodes in abnormal status, namely, abnormal nodes. Common causes of a network node status exception include abnormal network quality and abnormal node progress status. Currently, there are two common methods for detecting abnormal nodes. One is to detect abnormal nodes depending on the network quality, and the other is to detect abnormal nodes depending on the node progress status, which is described in the following.
When various data is transmitted over network medium using a network protocol such as TCP/IP, if the amount of information is too large, the excess network traffic will decrease the processing speed of a network node device, thereby causing network delay. Therefore, network delay is a typical index for evaluating network quality.
Currently, a typical method for detecting abnormal nodes in a distributed system depending on network quality is to send an Internet Control Message Protocol (ICMP) data packet to a tested network node, and detect the network quality of the node, according to information returned from the tested node for the ICMP data packet, such as network delay, and thereby discover an abnormal node.
Specifically, a fixed network delay threshold is currently preset. The network delay of a tested node is obtained by sending an ICMP data packet to the tested node, and the network delay is compared with the preset threshold. If the network delay is smaller than the threshold, the tested node is determined as a normal node; otherwise, the tested node is determined as an abnormal node.
According to the above solution, the method for detecting abnormal nodes in a distributed system by sending an ICMP data packet and presetting a fixed network delay threshold has the following disadvantage:
In the prior art, the preset threshold is fixed and cannot adapt to network changes. Therefore, when the network delay increases due to non-node factors such as overlarge data traffic on the entire network, abnormal node detection will present a lower accuracy.
For example, there are 1000 nodes, and at a specific point of time, the network delay of 999 nodes is 10 ms, and the network delay of the last node X is 1 s. If the preset detection threshold is 1 s, it is reported that the last node X is abnormal. In this case, the detection result is right.
However, if the network delay of 999 nodes reaches 1 s due to a certain unknown reason, such as overlarge data traffic on the entire network, and the network delay of the last node X reaches 10 s, an alarm indicating that the 1000 nodes are all abnormal is reported. Obviously, this detection result does not detect the actual abnormal node X with poorer network quality.
Therefore, the method for detecting abnormal nodes by setting a fixed threshold cannot detect whether a tested node is abnormal compared with other network nodes in the current network status, and the accuracy of abnormal node detection is low.
Moreover, in the distributed system, it often occurs that a data distribution path is restricted. For example, during data transmission from a source node A to a destination node C, although there are multiple paths from source node A to destination node C, data transmitted from source node A can reach destination node C only by a dedicated path. According to the current protocol, however, the ICMP data packet is free from path restraint, that is, the ICMP data packet may reach destination C by a path different from the data transmission path. Therefore, since the transmission path of ICMP data packet is different from data transmission path, the network delay acquired by sending an ICMP data packet cannot reflect the actual data transmission delay on a network, either. As a result, the detection result of abnormal node detection is not accurate.