In the past, most processing systems had only a single processor. Recent processors have added trace capability where Program Counter (PC) activity can be traced. Some systems also provide Data Trace and timing information. The trace output stream can impose very high bandwidth requirements, and such streams of trace data can overwhelm an attempt to capture them. A VLIW DSP (very long instruction word digital signal processor), such as a TMS64xx™ processor from Texas Instruments Incorporated with eight data paths running at 600 MHz, can execute 4.8 BOPS (billion operations per second), i.e. the product of 8 instructions/clock-cycle×600 MHz. Capturing four-byte or 32-bit PC (program counter) values from even a single processor CPU running at 600 MHz would generate 2.4 GByte/sec of PC data (4 bytes/cycle×600 MHz). Serial output of the data would involve a clock rate of 19.2 GHz (8 bits/byte×2.4 GByte/sec), which would be impractical or at least uneconomical for most current systems. Even if on-chip compression were used to reduce this enormous bandwidth requirement by, e.g. a factor of 10 depending upon the program activity, the resulting average trace bandwidth would be 240 MB/sec.
Moreover, combining trace streams for multiple processors would effectively multiply the clock rate specification imposed on a serial port for them. Conversely, given an available serial clock, combining trace streams for multiple processors effectively reduces the bandwidth available for each of the processors being traced by the number of trace data streams. Exporting larger and larger amounts of data at higher and higher data rates becomes very expensive and indeed impractical, especially when the recent devices now have four (4), eight (8), and more processors.
Merely acquiring and combining independent free-running trace streams from multiple processors is likely to be wasteful of the trace bandwidth. Even if the trace streams for an individual processor were somehow qualified to try to only capture data of interest, it would fail to address the needs of multi-processing systems. In a multi-processing system, the interaction between the processors is important as well, or even more important. Some architectures have been proposed in the past, see for instance:
U.S. Pat. No. 6,009,539 “Cross-Triggering CPUS for Enhanced Test Operations in a Multi-CPU Computer System,”
U.S. Pat. No. 7,332,929 “Wide-Scan On-Chip Logic Analyzer with Global Trigger and Interleaved SRAM Capture Buffers,”
U.S. Pat. No. 7,348,799 “System and Method for Generating a Trigger Signal.”
However, trace and debug circuits, methods and systems that can more effectively address the needs of multi-processing systems would be very desirable in the art.