1. Field of the Invention
The present invention generally relates to a method for collecting information related to system errors which occur in a computer system and, more particularly, to a method of collecting error correction code event-related information while the computer system is executing system management mode handler code.
2. Description of Related Art
An error occurs whenever a computer system such as a personal computer (or "PC") produces an incorrect result. The cause of computer errors varies and may include malfunctions of physical components and coding errors in software. As a result, nearly all computer systems are equipped with error detecting and/or error correcting capabilities. Error detecting code commonly uses parity checks to determine if there is an error in a received message. More specifically, a parity bit is added to the end of a message block. Upon receipt, the message block is checked to see if it contains the correct number of ones. If there are not the correct number of ones in the message block, the message contains an odd number of errors. Upon detection of an error, the receiver can request that the message be retransmitted. In most cases, the retransmission will result in an error free message.
Error detection techniques are generally disfavored because of the additional time consumed by retransmissions of those messages containing errors. Additionally, error detection requires a two-way exchange in order to request and process the retransmission of a message. The aforementioned deficiencies associated with error detection techniques are overcome by the use of techniques which incorporate error correcting code (or "ECC"). Generally speaking, ECC techniques assign a parity check to those positions in a code that have a one in the rightmost position of their binary position, a second parity check for those positions that have a one in their second to right position, etc. When a single error occurs, exactly those parity checks will fail for which the binary expansion of the position of the error has ones.
Thus, the pattern of the parity-check failures points directly to the position of the error. Once identified, the erroneous bit may be changed to its opposite value, thereby correcting the error. Further details regarding error detecting and error correcting codes may be found by reference to Ralston et al., ed., Encyclopedia of Computer Science, 3rd ed., pgs. 531-532 (1992).
While message errors are often isolated and/or random occurrences, they may be symptomatic of a greater problem in need of rectification.
Unfortunately, ECC techniques are self-correcting.
Thus, after a message error has been corrected, no record of the error remains. It is clear, therefore, that information regarding ECC errors would be quite useful in the diagnosis and/or repair of a computer system, particularly if such information could be collected with minimal impact on the operations of the computer system.
As described in the Intel386.TM. Microprocessor SuperSet Programmer's Reference Manual, the Intel family of microprocessors (386 series or higher), as well as clones of these microprocessors, are provided with a hidden set of instructions commonly referred to as system management mode (or "SMM") code. SMM code was developed in order to provide a means of transparently managing system operations for a computer system. Generally, a computer system equipped with SMM code will operate normally until the occurrence of an SMM event, i.e., an event which will initiate execution of the SMM code. For example, a low battery condition in a laptop computer is an SMM event which will initiate execution of the SMM code. When an SMM event occurs, a system management interrupt (or "SMI") request is transmitted to the central processing unit (or "CPU"). Upon receipt of an SMI request, the highest priority non-maskable interrupt, the CPU saves its state in a specially allocated portion of main memory, exits the normal operating mode and enters SMM mode where the CPU begins to execute the SMM code. The CPU will stay in SMM mode until the SMI request is handled. At that time, the CPU will return to the normal operating mode.
Broadly speaking, SMM mode provides a process to transparently handle SMM events, a varied collection of types of events which includes the aforementioned ECC events. Once an event is handled and the CPU returns to normal operating mode, however, information related to the SMM event is not retained. Thus, the user is deprived of useful information which could later be reviewed, for example, while running a separate application or diagnostics on the computer system.
It can be readily seen from the foregoing that it would be desirable to a provide a method of operating a computer system such that information related to its operation is collected whenever the computer enters SMM mode in response to the occurrence of an event, for example, a system error, for which information is to be collected. It is, therefore, the object of this invention to provide such a method of operating a computer system.