The typical architecture of a digital signal processor is based upon a sequential model of instruction execution that keeps track of program instruction execution with a program counter. When an interrupt is acknowledged in this model, the normal program flow is interrupted and a branch to an interrupt handler typically occurs. After the interrupt is handled, a return from the interrupt handler occurs and the normal program flow is restarted. This sequential model must be maintained in pipelined processors even when interrupts occur that modify the normal sequential instruction flow. The sequential model of instruction execution is used in the advanced indirect very long instruction word (iVLIW) scalable ManArray processor even though multiple processor elements (PEs) operate in parallel each executing up to five packed data instructions. The ManArray family of core processors provides multiple cores 1×1, 1×2, 2×2, 2×4, 4×4, and so on that provide different performance characteristics depending upon the number of and type of PEs used in the cores.
Each PE typically contains its own register file and local PE memory, resulting in a distributed memory and distributed register file model. Each PE, if not masked off, executes instructions in synchronism and in a sequential flow as dictated by the instruction sequence fetched by a sequence processor (SP) array controller. The SP controls the fetching of the instructions that are sent to all the PEs. This sequential instruction flow must be maintained across all the PEs even when interrupts are detected in the SP that modify the instruction sequence. The sequence of operations and machine state must be the same whether an interrupt occurs or not. In addition, individual PEs can cause errors which can be detected and reported by a distributed interrupt mechanism. In a pipelined array processor, determining which instruction, which PE, and which data element in a packed data operation may have caused an exception type of interrupt is a difficult task.
In developing complex systems and debugging of complex programs, it is important to provide mechanisms that control instruction fetching, provide single-step operation, monitor for internal core and external core events, provide the ability to modify registers, instruction memory, VLIW memory (VIM), and data memory, and provide instruction address and data address eventpoints. There are two standard approaches to achieving the desired observability and controllability of hardware for debug purposes.
One approach involves the use of scan chains and clock-stepping, along with a suitable hardware interface, possibly via a joint test action group (JTAG) interface, to a debug control module that supports basic debug commands. This approach allows access on a cycle by cycle basis to any resources included in the scan chains, usually registers and memory. It relies on the library/process technology to support the scan chain insertion and may change with each implementation.
The second approach uses a resident debug monitor program, which may be linked with an application or reside in on-chip read only memory ROM. Debug interrupts may be triggered by internal or external events, and the monitor program then interacts with an external debugger to provide access to internal resources using the instruction set of the processor.
It is important to note that the use of scan chains is a hardware intensive approach which relies on supporting hardware external to the core processor to be available for testing and debug. In a system-on-chip (SOC) environment where processing cores from one company are mixed with other hardware functions, such as peripheral interfaces possibly from other companies, requiring specialized external hardware support for debug and development reasons is a difficult approach. In the second approach described above, requiring the supporting debug monitor program be resident with an application or in an on-chip ROM is also not desirable due to the reduction in the application program space.
Thus, it is recognized that it will be highly advantageous to have a multiple-PE synchronized interrupt control and a dynamic debug monitor mechanism provided in a scalable processor family of embedded cores based on a single architecture model that uses common tools to support software configurable processor designs optimized for performance, power, and price across multiple types of applications using standard application specific integral circuit (ASIC) processes as discussed further below.