The present invention relates in general to transient event recording and in particular to capturing the traces of execution cycles in a computer preceding an error condition or failure.
Transient event recorders refer to a broad class of systems that provide a method of recording and eventually analyzing signals or events that precede an error or failure condition in logic, electronic, and electro-mechanical systems. Analog transient recorders have existed for years in the form of storage oscilloscopes and strip chart recorders. With the advent of low cost high speed digital systems and the availability of high speed memory, it became possible to record digitized analog signals or digital signals in a non-volatile digital memory. Two problems that have always existed in these transient event recoding systems are the speed of data acquisition and the quality of connection to signals being recorded. Transient event recording systems had to have circuits and recording means that were faster than the signals that were to be recorded, and the signal interconnection could not cause distortion or significant interference with desired signals.
Digital transient event recording systems have been particularly useful in storing and displaying multiple signal channels where only timing or state information was important and many such transient event recording systems exist commercially. With the advent of very large scale integrated circuits (VLSI), operating at high speeds, it has become very difficult to employ transient event recording techniques using external instrumentation. The signals to be recorded or stored could not be contacted with an external connection without a degradation in performance. To overcome this problem, trace arrays have been integrated on the VLSI chip, along with functional circuits, to facilitate the recording of signals relevant to occurring failures. Another problem that occurs when trying to use transient event recording techniques for VLSI circuits is that the trigger event, which actually began a process leading to a particular failure, sometimes manifests itself many cycles ahead of the observable failure event.
For hardware debugging of a logic unit in a VLSI microprocessor, a suitable set of control and/or data signals may be selected from the logic unit and put on a bus called the unit debug bus. The contents of this bus at successive cycles may be saved in a trace array. Since the size of the trace array is usually small, it can save only a few cycles worth of data from the debug bus. Events are defined to indicate when to start and when to stop storing information in the trace array. For example, an event trigger signal may be defined when a debug bus content matches a predetermined bit string xe2x80x9cAxe2x80x9d. For example, bit string xe2x80x9cAxe2x80x9d may indicate that a cache write to a given address took place and this may be used to start a tracing (storing data in the trace array). Another content, bit string xe2x80x9cBxe2x80x9d, may be used to stop storing in the trace array when it matches a content of the debug bus.
In some cases, the fault in the VLSI chip manifests itself at the last few occurrences of an event (for example, during one of the last times that a cache write takes place to a given address location, the cache gets corrupted). It may not be known exactly which of these last few occurrences of the event manifested the actual error, but it may be known (or suspected) that the error was due to one of the last occurrences. Sometimes there is no convenient start and stop event for storing in the trace array. Because of this, it is difficult to capture the trace that shows the desired control and data signals for the cycles immediately before the last few occurrences of the events. This may be especially true if system or VLSI behavior changes from one program run to the next.
The performance of VLSI chips is difficult to analyze and failures that are transient, with a low repetition rate, are particularly hard to analyze and correct. Analyzing and correcting design problems that manifest themselves as transient failures are further exacerbated by the fact that the event that triggers a particular failure may occur many cycles before the actual transient failure itself. There is, therefore, a need for a method and system for recording those signals that were instrumental in causing the actual transient VLSI chip failure.
A trace array is integrated onto a VLSI chip for storing and playing back a sequence of digital events that occurred prior to an error condition. The trace array is partitioned into N sub-arrays each having a storage for M entries. The trace array is combined with circuits that enable signals to be directed to a particular sub-array in response to logic states that are predetermined to be suspect in causing a later succeeding actual fault or error. Signals are directed to a sub-array and that sub-array records in a wrapping mode (old data is over written with new data) until a predetermined suspect event signal occurs at which time recording is stopped. Recording is then switched to another sub-array which continues recording in the same wrapping mode until a suspect event or an actual error signal occurs. The P sub-arrays that have been written into at the time of an error contain trace data preceding each of the corresponding P events signals that occurred prior to the actual error condition. If P exceeds N, indicating that an error has not occurred since the preceding N event signals, then logic directs the recording of signal states back to the first of the N sub-arrays where recording began. In this manner, the states of input signals, for the N events preceding an actual error and their M corresponding entries, are saved for analysis.
Another embodiment of the present invention uses an analog to digital converter (A/D) to convert an analog to an A/D signal. The A/D signal is stored along with selected logic signals in a stand-alone trace array (not in a VLSI chip) for debugging an electro-mechanical system. The digitized analog signal and logical signals are stored in a partitioned trace array according to embodiments of the present invention. Multiple A/D converters may be used if multiple analog signals are to be used in the debugging process.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.