The present invention relates in general to transient event recording, and in particular, to capturing the traces of execution cycles in a computer preceding an error condition or failure.
Transient event recorders refer to a broad class of systems that provide a method of recording, for eventual analysis, signals or events that precede an error or failure condition in logic, electronic, and electromechanical systems. Analog transient recorders have existed for years in the form of storage oscilloscopes and strip chart recorders. With the advent of low cost high speed digital systems and the availability of high speed memory, it became possible to record digitized analog signals or digital signals in a non-volatile digital memory. Two problems that have always existed in these transient event recoding systems are the speed of data acquisition and the quality of connection to signals being recorded. Transient event recording systems had to have circuits and recording means that were faster than the signals that were to be recorded, and the signal interconnection could not cause distortion or significant interference with desired signals.
Digital transient event recording systems have been particularly useful in storing and displaying multiple signal channels where only timing or state information was important and many such transient event recording systems exist commercially. With the advent of very large scale integrated circuits (VLSI), operating at high speeds, it became very difficult to employ transient event recording techniques using external instrumentation. The signals to be recorded or stored could not be contacted with an external connection without a degradation in performance. To overcome the problems of some prior trace event recorders, trace arrays have been integrated onto VLSI chips along with other functional circuits. Another problem that occurs when trying to use transient event recording techniques for VLSI circuits is that the trigger event, which actually began a process leading to a particular failure, sometimes manifests itself onto VLSI chips many cycles ahead of the observable failure event.
For hardware debugging of a logic unit in a VLSI microprocessor, a suitable set of control and/or data signals may be selected from the logic unit and put on a bus called the unit debug bus. The contents of this bus at successive cycles may be saved in a trace array. Since the size of the trace array is usually small, it can save only a few cycles of data from the debug bus. Events are defined to indicate when to start and when to stop storing information in the trace array. For example, an event trigger signal may be defined when a debug bus content matches a predetermined bit string xe2x80x9cAxe2x80x9d. A debug bus is the name for a bus used to direct signals to a trace array. For example, bit string xe2x80x9cAxe2x80x9d may indicate that a cache write to a given address took place and this indication may be used to start a tracing (storing data in the trace array). Another content, for example bit string xe2x80x9cBxe2x80x9d, may be used to stop storing in the trace array when it matches a content of the debug bus.
In some cases, the fault in the VLSI chip manifests itself at the last few occurrences of an event (for example, during one of the last times that a cache write takes place to a given address location, the cache gets corrupted). It may not be known exactly which of these last few occurrences of the event manifested the actual error, but it may be known (or suspected) that the error was due to one of the last occurrences. Sometimes there is no convenient start and stop event for storing in the trace array. Because of this, it is difficult to capture the trace that shows the desired control and data signals for the cycles immediately before the last few occurrences of the events. This may be especially true if system or VLSI behavior changes from one program run to the next.
The performance of VLSI chips is difficult to analyze and failures that are transient, with a low repetition rate, are particularly hard to analyze and correct. Analyzing and correcting design problems that manifest themselves as transient failures are further exacerbated by the fact that the event that triggers a particular failure may occur many cycles before the actual transient failure itself. There is, therefore, a need for a method and system for recording those signals that were instrumental in causing the actual transient VLSI chip failure.
A trace array is integrated onto a VLSI chip for storing and playing back a sequence of trace signal states that occurred prior to an event condition. Embodiments of the present invention allow the trace array to be simply partitioned into sub-arrays or Banks. The trace array is combined with circuits that enable trace signals to be selectively recorded to enable system debug or analysis. Addresses for the trace array are generated by combining the outputs of an event counter and a cycle clock counter. The cycle clock is selected to generate the low order bits of the trace array address and the event counter is selected to generate the high order bits of the trace array address. Program signals are used to determine how many bits of the total address are from the event counter and the cycle clock counter. The trace array may be partitioned into Banks in this manner. Each time an event signal is counted the trace array address indexes to an address in another Bank determined by the outputs of the event counter and the cycle clock counter. The cycle clock counter cycles through addresses in a selected Bank until another event signal is received. Each time the event signal occurrence causes the trace address to jump to a new Bank address, the event address is stored and a start code is recorded in the trace array at the new Bank address. Trace signals may be xe2x80x9cmaskedxe2x80x9d or selected so all trace signals or a predetermined sub-set of trace signals are monitored to determine state changes between cycle clock times. If no selected trace signal changes its state between cycle clock times, then a compression code is recorded in the trace array and the cycle clock counter is not incremented. Instead, the number of cycle clocks in which none of the trace signals change states is recorded as a time stamp in the trace array. A trace may be stopped for readout on the receipt of a stop signal which may be the result of another event such as an error condition. The trace array address corresponding to the stop signal is saved as the counters are not incremented. Each Bank within the trace array has a logic true Bank valid bit when the Bank has valid trace signal data stored. An output processor (processing function, either hardware or software) reads out the trace array and reconstructs the trace signal sequences using the event addresses, start codes, compression codes, time stamps and the implicit stop address. The output processor resets the Bank valid bits on readout. Using embodiments of the present invention, it is not necessary to know where trace recording starts in the trace array nor is it necessary to reset the address counters to enable reconstruction of the original received trace signal sequences.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.