Many SOCs (System On Chip) today have the ability to trace instructions that are executed on their respective main cores units. Such traceability enables the programmers to review and analyze what code the respective cores or CPUs have executed, a valuable instrument in cases where proper execution of code failed. This debugging is however not only interesting for “post-mortem” analysis, but also for enhancing performance or reducing power consumption of the SOCs. Some SOCs also support tracing the data the one or more CPUs are reading and/or writing. The result from a tracing operation is called trace, for example instruction trace or data trace. A data trace is different from an instruction trace, as the former one generally includes the results or output from certain instructions executed by the CPU. An example of a SOC supporting the generation of an instruction trace is the Exynos 5250 from Samsung based on an ARM® Cortex®-A15 processor. Other SOCs from this vendor, like the Exynos 5420 including a Cortex®-A7 processor also support data tracing. Most cores from ARM Ltd support tracing instructions (e.g. ARM Cortex®-A15) and some also support tracing data (e.g. ARM Cortex®A7). The document Embedded Trace Macrocell Architecture Specification, ARM Limited, and published under the URL infocenter.arm.com/help/index.jsp, reference number: ARM IHI 0014Q provides in chapter 7 more information about the tracing procedure and the access to the hardware to obtain such trace.
The traces, both an instruction trace and where available a data trace are received from the debug target, e.g. an SOC with one or more processor cores, and stored on a separate computer, or a debug host for later analysis. This lets the programmer determine what and how a certain piece of code has been executed and what kind of data has been read or written, respectively.
A trivial implementation for an instruction trace contains all instruction addresses in the instruction trace. In some implementations such as the one mentioned above, the amount of trace information is reduced by only outputting the target address of any indirect branches or whether any conditional instructions were executed or skipped. To fully reconstruct the instruction trace the debug host needs access to the same instructions as were executed. These can be obtained by for example reading the memory from the target or loading a file containing the same binary on the debug host.
While the instruction trace enables the programmer to obtain information about the instructions performed by the CPU, the full state of the CPU at any given time can be reconstructed only together with the data trace. The data trace also enables the programmer to reconstruct the state of variables in memory, except for the effect of other CPUs or hardware accelerators without data trace. This can be used to implement “reverse execution” in the debugger, where the debug host can display the state of a program as it was at any time during the trace session. An example of this is the Context Tracking System facility (CTS) in the TRACE32 software from Lauterbach Datentechnik GmbH to be found at the URL: www2.lauterbach.com/pdf/general_ref_c.pdf. The CTS can also optionally be used to fill in small holes in a trace by executing the instructions the same way as the CPU normally would.
Traditionally, the information for the data trace as well as for the instruction trace is extracted from the debug target by an interface giving access to the pipeline in the CPU (or close to it) of the debug target, to enable precise mapping of data to the corresponding instruction. To allow reconstructing the state of the CPU at any given time all read accesses the program performs are output in the trace stream. The trace may be enabled only for parts of the program. This cuts down the volume of the trace stream, but also limits visibility from the debug host.
Trace streams from different sources are normally not fully synchronized. If they are separately extracted from the debug target and then synchronized in the receiver by adding proper time stamps and the like, a synchronization uncertainty of several us, e.g. 30 μs, can occur. If data and instruction trace are merged on the SOC the uncertainty decreases to about 200 ns, which can be further reduced to about 30 ns when synchronization markers are added. The uncertainties shown above are rough guidelines and will vary significantly between implementations today.
Although there are exceptions, modern high performance CPUs often lack data trace capability. One reason is that it is becoming hard to extract the data trace due to its high volume of data. In a hypothetical CPU operating at 1 GHz and capable of an average execution of one instruction every two clock cycles with a “read” instruction of four bytes every fifth instruction, the amount of data to be read increases to: (1 GHz/2)/5*(4+4) Bytes=800 MBytes/s=6.4 Gbps. Here, 4+4 Bytes are used, meaning that respective 4 bytes of trace and data are traced.
Typical interfaces available for trace transfer to the debug host are limited to about 10-20 Gbps. Examples here are 10GBase-T Ethernet, and parallel and serial trace based on ARM® CoreSight™ trace technology. While there are interfaces capable of these or even faster bit rates, the interfaces typically consume too much power, too many pins or are too hard to route on a PCB. This limitation has caused a decline SOCs capable of data tracing.