There has been significant progress made in recent years to facilitate accurate monitoring of the operation of computer or digital systems that experience faults or errors. Today's digital systems (e.g., computer processors, microprocessors) employ fault management techniques that involve the detection, isolation, and correction of faults or errors. For instance, computer processors generally include a combination of system hardware, software, and/or firmware for detecting faults in real time and logging those faults according to a fault logging specification that provides instructions regarding the recognition and prioritization of faults. Generally, the fault logging specification mandates logging of the most severe or detrimental fault so that the most severe error may be addressed or corrected in a timely manner. For example, if a computer processor accesses a data cache (e.g., SRAM) to carry out a read request, and ten bits of data provided in response to the request contain a first error including one incorrect bit and a second error including two incorrect bits, the fault logging specification implemented within the processor may mandate logging the second, more severe error in the computer processor's real-time fault log. In instances where there are multiple faults of equal severity, the design specification may mandate logging the first fault seen by the real-time fault log.
Fault log modeling is often used to validate the digital system's fault detection and logging logic and real-time fault log. That is, fault log modeling is used to verify that the digital system's fault detection and logging logic correctly implements the fault logging specification. To validate the system's fault detection and logging logic, designers seek to confirm whether the system is correctly implementing the fault logging specification such that the system consistently selects the correct or “right” fault for recording in the real-time fault log. Returning to the computer processor example discussed above, one or more faults may be purposefully injected into a data path of the processor. The fault detection logic within the processor may detect the faults and pass them to the fault logging logic, which may apply the fault logging specification to select one of the faults for logging in the processor's real-time fault log. In parallel, an independent fault log modeler may separately analyze the faults to determine which of the injected faults should be selected by the processor for logging in the real-time fault log if the processor's ability to log faults is correctly implemented according to the fault logging specification. This fault is represented in a fault log model. The processor's real-time fault log may then be compared against the fault log model. If the real-time fault log matches the fault log model, then the processor's fault detection and logging logic is correctly implemented according to the fault logging specification.
While existing fault log modeling techniques provide a mechanism for testing a digital system's real-time fault log, they lack the sophistication to stress the system's ability to correctly recognize, prioritize, and log multiple faults occurring in close temporal proximity. That is, existing fault log modeling techniques involve a simplistic cause and effect relationship between fault injection and fault logging, making these techniques useful only in predicting simple fault log behaviors under isolated circumstances in which the injected faults are known. The simplified nature of existing fault modeling methods does not allow for accurate fault modeling of complex, randomized sequences of faults that involve numerous different types of faults or errors, numerous faults occurring over a limited period of time, or both.
Hence, there remains a need for methods and systems for providing more accurate fault log modeling for use in testing and validating a digital system's fault detection and logging logic and real-time fault log under a variety of system conditions, including circumstances in which multiple faults and/or various types of faults occur within close temporal proximity.