1. Field of Invention
This invention relates to microprocessor design, and more particularly, to the implementation of on-chip debug capability in a microprocessor.
2. Description of Related Art
Modern microprocessors offer unprecedented performance. For a variety of digital integrated circuits (IC's), speed, level of integration (i.e. transistors per square centimeter) and capabilities have improved. Moreover, in many cases, these performance improvements have been accompanied by reductions in size, power consumption and cost of the devices. However, these benefits require greater complexity in digital logic design. Because of this complexity, the investment of time and resources by the manufacturer to design and fabricate a digital logic device has increased. For this same reason, the possibility of a mistake or oversight on the part of the designer has become more likely.
An architectural feature common to most high performance microprocessors is the instruction pipeline. A microprocessor typically processes each instruction in a sequence of operations. For example, fetching the instruction from memory is often followed by a decoding operation, to determine what operands are needed and where they are located. Once the operands are available, the instruction may be executed, following which results are saved back to memory. Rather than performing the entire sequence of operations on one instruction prior to fetching the next, an improvement in throughput can be obtained by performing the operations concurrently on consecutive instructions. The pipeline can be likened to an assembly line, where a series of operations is performed on a product in stages, as it moves down the line. Ideally, if each pipeline stage performs its associated operation in a single clock cycle, the average processor execution rate can be as high as one instruction every clock cycle.
In practice, however, since the performance benefits of an instruction pipeline depend on keeping the pipeline full, maximum throughput is generally not possible on a consistent basis. A complication arises when a data transfer cannot be performed quickly enough to sustain pipeline throughput. For example, if the instruction currently making its way through the pipeline requires data to be fetched from memory and, for whatever reason, the memory cannot be accessed in the allotted time, the pipeline must be halted for at least one clock cycle while the correct data can be fetched. This failure to access the data needed by the instruction propagating through the pipeline is often called a data “load miss” (or “read miss”), and the extra clock cycle is referred to as a “fix-up cycle.”
The use of fix-up cycles to handle load misses can lead to a problem for the diagnostic circuitry in a pipeline-equipped microprocessor. The problem occurs when an exception occurs (i.e., an interrupt resulting from some condition internal to the microprocessor) during the fix-up cycle inserted to handle a load miss associated with an instruction in a “branch delay slot.” A branch delay slot is the instruction position immediately following a branch instruction. A complex break state machine associated with the diagnostic circuitry monitors addresses and data values present on the microprocessor buses. The state machine updates its internal state in response to trigger events, which correspond to specified addresses and data values. If a prescribed combination of trigger events and previous internal states occurs (i.e., a complex breakpoint), the state machine halts the microprocessor, permitting its internal status to be examined.
When an exception occurs, the microprocessor temporarily suspends the current program sequence to enter a special program segment, known as an exception handler, designed to deal with the exception. Following execution of the exception handler, the normal program sequence is resumed. When the exception occurs during execution of an instruction in a branch delay slot, normal program execution typically resumes by re-executing the branch instruction. This can result in the complex break state machine being erroneously updated twice for the same branch instruction. One answer to this problem would seem to be lengthening (i.e., adding more stages to) the pipeline. It would then be possible to compensate for a data load miss without inserting a fix-up cycle. However, considerable additional circuitry would be required to extend the pipeline, making this an expensive and impractical solution.
In view of this problem, it would be desirable to have a means of avoiding spurious updates of the complex break state machine associated with the diagnostic circuitry of a high-performance microprocessor. Ideally, the solution should be inexpensive and should not compromise the performance of the microprocessor or the rest of the diagnostic circuitry.