Error detection, correction, and recovery are important features in computers and computer systems. Machine check events, including machine check abort (MCA) events, occur in a processor when an error condition occurs that requires corrective action. These errors occur for a variety of hardware and software reasons, such as system bus errors, memory errors, and cache errors, to name a few examples.
Machine check events include both local and global events. Local errors occur in a processor that encounters an internal error or platform error. These errors are not broadcast to other processors. By contrast, global errors result in a system wide broadcast that notifies other processors of an error condition. In response to the broadcast, all the processors in the domain enter an error handling mode and process the error event.
Machine check events can be quite harmful and affect the entire hard partition. If the event is not cured, then it can cause the system to perform a crash dump and reboot. In other words, these errors are not limited to a portion of the hard disk partition, but adversely affect the entire hard partition and operating system. In addition, the system has to incur down-time for system failure analysis and correction and often requires servicing.
As computers and computer systems become faster and more complex, addressing hardware and software errors, such as machine check events, becomes increasingly important. In order to help ensure the integrity of such computers systems, the adverse effects of these errors should be minimized or eliminated.