Debugging software in embedded processors is a difficult task. In order to assist software development, embedded processors conventionally have some sort of debug capability. For an ARM9 family of embedded microprocessors, there is an add-on module, called the Embedded Trace Macrocell (ETM), that allows for real time debug via an external trace port. The ETM has triggering facilities and a FIFO that allow for transfer of both instructions and data through the trace port to an external trace port analyzer hardware without stalling the microprocessor.
Referring to FIG. 1, a block diagram of a conventional apparatus 10 having multiple processors 12A–B and multiple ETMs 14A–B is shown. A very close coupling requirement causes a one-to-one relationship between the embedded processors 12A–B and the ETMs 14A–B. Each ETM 14A–B closely monitors dedicated signals (i.e., PROC_TO_ETM) presented by the associated embedded processor 12A–B to determine the instruction and data traces.
Both the ETMs 14A–B and the processors 12A–B have embedded test access port (TAP) controllers (not shown). The TAP controllers in the ETMs 14A–B and in the processors 12A–B run in parallel. In a multi-processor apparatus 10, the processors 12A–B are serially connected to a scan chain formed among the TAP controllers, with the ETMs 14A–B maintaining the parallel relationship to the processors 12A–B. The resulting scan configuration allows tools like Multi-ICE to communicate with the processors 12A–B and the ETMs 14A–B simultaneously. As a result, the processors 12A–B may be debugged simultaneously via a common JTAG interface.
Due to the close coupling of the ETMs 14A–B with the processors 12A–B and the scan chain requirements of trace port analyzer tools, sharing a single ETM 14A–B among multiple processors 12A–B is not practical. A disadvantage of having an ETM 14A–B for every processor 12A–B is primarily gate count. Each ETM 14A–B requires 30,000 to 70,000 gates. The cost of adding an additional ETM 14 grows linearly with the number of embedded processors 12 in the apparatus 10. If, for example, there are ten processors 12 in the apparatus 10, then 700,000 gates are required for the ten ETMs 14. As a result, the apparatus 10 is too costly to be practical.