Conventional multi-processor systems employ a micro-sequencer to reduce or eliminate interactions between the central processing unit (CPU) and the coprocessor subsystem during complex algorithms. FIG. 1 illustrates an example architecture of a customized system used for video coding. Micro-sequencer 120 includes sequencer state machine and control block 104, which handles interactions between coprocessors Cop_X 101, COP_Y 102, and direct memory access (DMA) 103. Coprocessor Cop_X 101 and Cop_Y 102 employ four dedicated memory blocks: command memory_1 105; command memory_2 106; quantization memory 107; and Huffman memory 108. Sequencer state machine and control block 104 executes commands read from sequencer command memory 109. CPU 100 passes sequencer commands via bus interface 110 and coprocessor bus 112 to the sequencer command memory 109. Micro-sequencer 120 interacts with coprocessors Cop_X 101, Cop_Y 102 and DMA 103 and their processing on shared memory A 113 and shared memory B 114. DMA 103 provides direct access to SDRAM external memory 117 via SDRAM controller 111. Sequencer state machine and control block 104 cannot interact with the other memories, command memory_1 105, command memory_2 106, quantization memory 107, Huffman memory 108 or bitstream buffer 115. CPU 100 has full control of all the programming directing interactions between coprocessors and all memories other than shared memory A 113 and shared memory B 114. The example system of FIG. 1 attempts to offload the compute intensive processing from the CPU 100 to coprocessors Cop_X 101 and Cop_Y 102.
FIG. 2 illustrates an example of a conventional multi-processor sequencer 230. The complexity of sequencer 230 depends on the number of shared memories 210 and the required interaction between processors CPU 200, Proc_2 202, coprocessor Cop_1 211, coprocessor Cop_2 212, DMA 213 and shared memories 210. CPU 200 loads the sequencer command memory 222 via path 224 with instructions for executing sequential operations. Sequencer state machine and control logic 214 coordinates processors allowing collision free use of the co-processor bus 215. Sequencer state machine and control logic 214 provides enable and interrupt signals to the processor and coprocessor elements via path 228. Interrupt requests generated by each processor are passed via path 216 to interrupt controller 217. Task status registers 218 keep track of interrupt requests, cleared interrupts, flags and generate enable and disable commands as required. Interrupts pass to the processor elements via paths 219, 225 and 228. CPU 200 and sequencer state machine and control logic 230 are master elements for bus arbitration within the coprocessor bus 215. All other elements attached to coprocessor bus 215 are slaves. Command decoder 223 decodes commands from the sequencer command memory 222 and passes decoded commands to coprocessor bus 215 via path 228 and then to the destination processor. Path 229 sends a CPU_go command releasing control to the CPU 200.
The multiprocessor system illustrated in FIG. 2 improves the overall throughput of computations performed in comparison to a single processor system but does not provide for the most powerful and efficient use of the available processors and coprocessors. The sequencer is limited to single threaded operations, operations that often must remain sequential because of the limitations of the sequencer to direct a plurality of simultaneous operations.