Today's microcontrollers have a highly integrated and complex architecture. It is common or even necessary to provide an on-chip debug logic allowing a user or software developer to debug the program code of an application that is currently under development on the original application board. Most microcontrollers comprise a debug unit or logic according that is coupled through an interface using the widespread JTAG-standard. This additional logic must be implemented on chip in order to provide the functionality needed. A common and basic feature for debugging is a code breakpoint that stops the execution of an application if a predefined instruction is reached. Typically, the instruction is identified by its memory address in an instruction register, which often referred to as program counter (or short PC). This debug functionality may be realized by simply monitoring the address lines of an instruction fetch unit of the microcontroller, e.g. by help of a bus comparator. However, more complex breakpoints may be desirable. These breakpoints may not only consider the instruction address but also the data that is transferred by the instruction. Even more complex breakpoints may consider additional criteria.
A stopping breakpoint is one of the simple debug actions which halts application processing upon fulfilling the breakpoint condition. Other debug actions may also take place, e.g. a trace transaction or a debug interrupt.
It is a general goal during debugging to correlate the instruction with the data which is transferred due to the execution of the instruction. This might be a challenging task since the fetch of the instruction and the correlated data transfer do not take place at the same time. For a non-pipelined processor (CPU) having a combined instruction and data bus (in the following also referred to as a memory bus), the instructions are executed in sequence and a correlation between the instruction and the resulting data transfer may easily be established.
FIG. 6 is a schematic illustration of the bus activity for a number of subsequent clock cycles within such a system. Exemplarily, a first instruction In1 is fetched and a read operation is performed due to this instruction. Accordingly, during the first two clock cycles, the instruction fetch of the first instruction In1 and the respective read operation (Read OP) may be monitored at the memory bus. Further, this first instruction In1 causes a write operation (Write OP) to memory address Rx0. This action may be monitored at the CPU register. An exemplary second and third instruction In2 and In3 perform similar operations and write data to memory addresses Rx1 and Rx2.
However, modern processors often have a pipelined architecture. According to this processor architecture, the execution of an instruction is separated into a plurality of sub-actions which are executed by successive stages of a processor pipeline. At a given point in time, several different instructions may be executed by the different pipeline stages. This technique which is also known as pipelining increases the overall performance of the processor.
FIG. 7 exemplarily illustrates five stages of a classical RISC machine. The different stages are: instruction fetch (IF), instruction decode (ID), execute (EX), memory access (MEM) and register write back (WB). An instruction follows through the pipeline stages of the processor pipeline during subsequent clock cycles, as it is indicated by the horizontal columns. During the exemplary clock cycle 4, the first instruction performs a memory access (MEM), the second instruction is executed (EX), a third instruction is decoded (ID) and a fourth instruction is fetched (IF). During clock cycle 4, the fifth stage of the pipeline, namely the register write back-stage, is idle.
Pipelined processors are organized in which the pipeline stages can semi-independently work on separate jobs. Each stage is organized and linked in a chain, i.e. the pipeline, and each stage's output is fed to a subsequent stage until the job is done. The overall processing time is significantly reduced. However, it is not possible to observe all of the activity associated with the execution of the instruction by observing the memory interface alone.
This problem is illustrated in FIG. 8 which is a schematic view to the bus transactions of a pipelined processor having a combined instruction and data bus. During a first clock cycle, an exemplary first instruction In1 is fetched. In a second clock cycle, the memory bus is idle and during a third clock cycle, a second instruction In2 is fetched. Further, in a fourth clock cycle, the first instruction In1 performs a read operation while in a fifth clock cycle a third operation In3 is fetched. By simply monitoring the bus activity, it is not possible to correlate an instruction and a data transfer caused by this instruction.
A countermeasure to this problem consists in providing a pipeline flattener (also known as a flattener circuit) buffering or delaying the pipeline signals. In principle, a pipeline flattener is a first-in-first-out (FiFo) circuit. Different signals from different pipeline stages are delayed by different amounts of time and the pipeline flattener outputs all information of a given instruction even though this information was gathered at different points in time during execution of the instruction. A pipeline flattener tracks all actions of an instruction through the pipeline. When the instruction exits the pipeline of the processor, connected debug logic may reconstruct the instruction. In its simplest version, the instruction is identified by its address which is tracked through the pipeline stages together with the instruction.
FIG. 9 illustrates this for an exemplary pipeline having a depth of five stages. In a first stage, a first instruction In1 is fetched from the instruction register. The instruction is identified by its instruction address IAddr. This identifier is fed through every stage of the pipeline. Consequently, data transactions which are due to the execution of this instruction may be correlated to the respective instruction. Exemplarily, in FIG. 9, the fifth instruction In5 performs a register write back and the debug logic may correlate this action with the instruction by help of the address IAddr (In5).
However, a pipeline flattener has a high gate count that is nearly equal to the number of gates which are used for the processor pipeline itself. Considering a 32-bit address, for a five stage pipeline 32*5=160 flops are necessary for tracking the instruction address in the debug logic. Typically, a pipeline flattener not only tracks the instruction address but also additional status signals. This will lead to a number of necessary flops being significantly higher than the above-estimated value. Especially for cost and power sensitive applications, these extensive debug solutions are undesirable due to their high gate count and high power consumption.