In a communications system, when a fault occurs on a device, a method is needed to troubleshoot the fault, so as to avoid severe impact on performance of the communications system caused when the fault cannot be troubleshot for a long time.
A troubleshooting method may be manually performed. However, manually detecting a fault and then troubleshooting the fault usually lead to relatively high time and labor costs. Therefore, the industry gradually expects a device in a communications system to automatically troubleshoot a fault in the communications system, so as to improve troubleshooting efficiency and reduce labor costs.
In a troubleshooting method in the prior art, whether a device becomes faulty is mainly determined according to a heartbeat message of the device. A monitoring device may periodically send a heartbeat message to a monitored device, and after receiving the heartbeat message, the monitored device may return a response message to the monitoring device. If the monitoring device has not received, within a specified time after sending the heartbeat message, the response message returned by the monitored device, it is determined that the monitored device becomes faulty, and further, the entire monitored device is reset or a function carried by the monitored device is switched to another device for troubleshooting.
However, there may be multiple causes why the monitoring device has not received the response message within the specified time. For example, the cause may be that an interface unit used by the monitored device to send the response message becomes faulty. In this case, another interface unit of the monitored device may be invoked to replace the interface unit without resetting the entire monitored device or function switching. Resetting the entire monitored device or function switching causes relatively high risks, and affects a relatively large quantity of services.
In conclusion, in the troubleshooting method in the prior art, a fault is analyzed and troubleshot according to a heartbeat message of a device, causing relatively low precision in fault locating.