When an error occurs on a computer system, for a regular error, the system generates a system management interrupt (SMI), then collects error data, and performs further processing according to the error data, so as to detect a fault.
For many devices, after a hardware fault (for example, a memory chip fault or a memory data cable fault) occurs, a correctable error may be generated. The correctable error means that the error can be corrected. When a correctable error occurs, a system can continue running. However, if the correctable error is caused by a hardware fault, before the hardware fault is cleared, a continuous correctable error storm occurs. Though the system can still continue running, the system runs in an ill state, and system performance deteriorates and a critical error occurs at a significantly higher probability if the system continues running. In this case, an alarm about a faulty module should be generated immediately and the faulty module should be replaced as soon as possible. That is, error data about the correctable error storm needs to be collected, so as to detect the hardware fault.
However, in the case of a continuous correctable error storm, if error data is collected using an SMI, the system may fall into an SMI interrupt trap, and in symptom, the system is suspended or crashed. Therefore, how to effectively handle a correctable error storm has become an urgent technical problem to be resolved.