1. Field of the Invention
The present invention is directed to the development of an integrated circuit containing multiple processing cores on a single chip (i.e., a system on a chip). More particularly, the present invention is directed towards tracing and debugging logic and techniques for simultaneously ascertaining and displaying the real-time state of any number of the processing cores on the integrated circuit as they operate.
2. Background of the Related Art
The system on a chip (SoC) field has arisen as the amount of digital logic that can be placed on a single semiconductor chip has substantially exceeded the amount of digital logic required by a single processing core (Throughout this specification, the term ‘processing core’ is generically used to refer to any on-chip logic device, such as a microprocessor, microcontroller, memory management unit, arithmetic logic unit, audio-video controller, etc., that extracts stored instructions from memory, decodes them, and executes them using a program counter or the like.). SoC technology uses that additional capacity to create separate processing cores on the silicon. These processing cores can now be quite complex and do substantial amounts of work without predetermined cycle by cycle interaction with other cores on the SoC. These processing cores can also simultaneously run different software programs, some of which may interact with each other as well as with devices off the SoC. Simultaneously ascertaining the current state of these processing cores as they operate is of primary importance for debugging the SoC.
Traditionally, processing cores have been manufactured each on their own chip, with all their input and output (IO) signals connected to the exterior of the packaged chip. Because of this, it has always been possible to observe the operation of a processing core by attaching test equipment to its external IO signals and monitoring them. The information gathered by monitoring these external IO signals is called a trace. The trace is useful when analyzing the behavior, or misbehavior, of the processing core. The trace can show problems in the programming of the processing core and point to errors in the processing core hardware. The trace can be thought of as an external recording of the activity of the processing core that a user can play back with software tools in order to understand what internal operations the processing core took and why.
Because of the complex nature of modern processing cores, the trace of external IO signals is often augmented with other data to give a user additional visibility into the processing core's internal operation. Bringing selected internal signals of the processing core to the outside of the packaged chip as additional output signals accomplishes this augmentation. Often times, a processing core will be packaged in two versions. One version will be for general use and will not have the additional output signals connected outside the package. The other special version, specifically designed for debugging, will include the additional output signals. This special version is generally referred to as an In-Circuit Emulation (ICE) processing core design.
There are numerous factors in the design of modern multiple processing core SoCs that make the above strategies increasingly insufficient.
First, the speed at which internal logic can operate on a chip is becoming significantly faster than the speed at which IO logic can be routed off and external to the chip. Modern processing cores run at internal speeds exceeding 400 MHz, while the speed of signals routed off of the chip is much lower. This is a practical necessity, since handling high-speed signals outside the chip is much more difficult than handling them inside the chip. Some processing core IO signals, for example those used for memory access, can be slowed down. Unfortunately, the signals that convey trace data of a processing core off of a chip cannot be slowed down without also slowing down the internal speed of the processing core, since those trace data signals reflect the real-time, internal state of the processing core. To provide useful information, trace data must run at the internal rate of the processing core. Toggling external IO pins at the internal processing core speed can be either prohibitively expensive or impossible.
A second reason that traditional ICE processing core designs are no longer sufficient is that chip packages are becoming much larger. As chip sizes increase, the number of transistors on a chip increases much faster than the possible number of IO signals off the chip. This is often referred to as Rent's Rule. In many modern chip designs the chip is said to be pad or IO limited, which means that based on the size of the chip, there is not sufficient room for all the IO signals that the designers would like, or need, to have routed off the chip. In such environments, adding additional IO signals for the sole purpose of software debugging can seem unnecessarily expensive, if not impossible.
Another problem facing ICE design solutions is that instead of being manufactured on individual chips, processing cores are increasingly being combined together as part of a much larger embedded system, or SoC, on a single chip. The processing cores on an SoC may not be connected to the SoC's external IO signals at all (i.e., those IO signals routed off-chip). Instead, they may be completely embedded within the SoC, with their own IO signals connected only to other devices within the SoC. In such a situation, it can be nearly impossible to directly observe the operation of the embedded processing core, because there are no IO signals external to the SoC coming from that processing core. The issue of Rent's Rule exacerbates this problem because each of the embedded SoC processing cores is generating as much information in one clock cycle as a stand-alone, single chip, processing core would have generated. Consequently, the problem of operational observability is even more difficult for multiple processing cores embedded within an SoC.
Co-pending U.S. patent application Ser. No. 09/680,126 to Newlin et al. dramatically advanced the state of the art of debugging SoCs. As disclosed therein, JTAG devices on an SoC can be serially connected together and communicate off-chip using the IEEE 1149.1 JTAG specification. The IEEE 1149.1 JTAG specification defines a communication format that uses five signals to control a series of devices called TAP controllers. This specification is attractive for low performance communication with devices on an SoC because of the relatively small number of signals that it uses. Having individual units with TAP controllers on an SoC allows debugging tools to retrieve JTAG information from the SoC. However, even with these advancements, challenges remain. The JTAG interface is relatively slow, and does not provide for real-time tracing. The JTAG chain does not handle simultaneous trace output from multiple processing cores on an SoC. Finally, the JTAG chain is not designed to handle the amount of data necessary to produce simultaneous real-time trace of multiple-cores on an SoC.
Another problem facing simultaneous real-time multiple processing core SoC debugging is accurately reconstructing a traced processor core's internal register values during trace playback. Trace streams often trace the execution address of the processing core (the PC) but not the register values. This limits the usefulness of the trace. Consider the following C code:
typedef void (*DRIVER_FUNC) (void *);DRIVER_FUNC driver_tablet[10] = { . . . };void call_through( int I, void *arg ){driver_table[I] ( arg );}A trace of the execution stream of a processing core executing this code will show the one statement in the function being executed, and it will show the address called through the driver_table but it will not show the value of I. If this statement is incorrect, a trace of the execution stream will show that this is the problem statement but will not give insight as to why the statement is incorrect. In particular, there are two reasons this could be producing the wrong behavior. The contents of driver_table could be corrupted or I could be invalid. Without knowing the value of I, discerning between these two problems can be difficult. Knowing the contents of I requires the tracing of register data in addition to the tracing of the PC.
Yet another problem facing simultaneous real-time multiple processing core SoC debugging is accurately reconstructing a traced processing core's internal register values during lengthy loops. To illustrate this shortcoming, consider a traced processing core running the following C function:
int sum( int val, int rep ){int j = 0;int sum = 0;while( j < rep ){sum = sum + val;j = j − 1;}return sum;}This simple function is really just a loop that multiplies the integer val times the integer rep. While the function may seem trivial, it illustrates an important challenge to debugging this SoC. The function, when compiled by a C compiler, generates the following assembly code.
entrysp, 32mov.na5, a2movi.na2, 0mov.na4, a2bgea2, a3, .L2.L3:add.na2, a2, a5addi.na4, a4, −1blta4, a3, .L3.L2:retw.nThe body of the loop in this code is between the .L3: label and the blt instruction. Blt is a branch instruction and is responsible for transferring control to the top of the loop. Note that the a5 register is referenced, but not written to, in the loop body. The loop consists of three instructions and each iteration will require at least three trace entries if the state of the processing core is to be tracked accurately and in real-time. Memory available for storing the traces is a fixed size and after some number of iterations through the loop, the trace entries tracking the instructions immediately prior to the loop will be lost, or overwritten. It is in these pre-loop instructions that the value of a5 was written. Upon losing the pre-loop instructions, the value of a5 will no longer be available to the user. In general, if a write to a register is not captured and maintained within a trace sample, that information is missing for any subsequent debugging and analysis.