The application of conventional program flow trace schemes to a typical task-oriented software program application, such as, for example, program code instructions for controlling mechanical/electronic tasks in which approximately 5% of the executed instructions are unconditional direct branches, 10% are conditional branches and 4% are indirect branches, generates trace data rates of about 0.8-1.0 bits per executed instruction. Given that the number of instructions per cycle may be approximately 0.52 for such an embedded application, a 200 MHz processor core will generate a conventional trace data rate of about 80-100 Mbits/s.
However, a conventional 2-pin tool interface of a computer processing system typically has a bandwidth of 40 Mbits/s, thereby requiring an additional 2-3 dedicated trace pins to accommodate the conventional trace data rate. Furthermore, as the computing power and clock rate of modern processors increase more and more, the trace data rate increases correspondingly so, thereby requiring more dedicated trace pins to retrieve/output the large volumes of trace data.
It would be desirable to implement a system and method for retrieving trace data that allow for reconstruction, debugging and performance analysis of computer program flow without the additional cost and constraints imposed upon the system by dedicated trace pins.