The high performance required for real-time processing in communications and multimedia applications stresses processor architectures in many different ways. The Single Instruction Multiple Data (SIMD) parallel-processing model that exploits the application's properties of natural parallelism is considered the most acceptable way to deliver the high performance need for both today's and future applications. The SIMD model assumes a plurality of processing cells. Each cell may include various combinations of hardware: address generators, multipliers, arithmetic logic units (ALUs), memory, registers, and sequencers. Generally, as the investment in hardware increases, the speed of processing goes up, but so does the power requirement, cost, and die area required. Cost and area are always considerations in hand held devices such as personal digital assistants (PDAs), cell phones, and other wireless handset terminals. Many different approaches have been used to address these problems. One approach, known as the very long instruction word (VLIW), VLIW architecture increases the speed by packing multiple operations into a single instruction word, which is then executed in parallel as a very wide instruction unit. However, this requires a very large register capacity and memory to store those instructions. The programming limitations of VLIW processors require engineers to use very low assembly programming to achieve high performance. Such programming requires specialized knowledge of the hardware architecture, which can be extremely complex. An alternative architecture to VLIW is Single Instruction Multiple Data (SIMD). This model assumes an array of processing cells each executing the same sequence of instruction on their local data. The key advantages of this approach are a reduction in overall hardware complexity, design regularity, the enhancement of computing resources and simplified path to software development. These come from the fact that only a single instruction-decode-and-dispatch is required. An example for such an array is the reconfigurable ALU array, offered by Elixent Limited of Bristol, England, which uses an array of four bit ALUs and register/buffer blocks. The ALUs are interwoven with adjacent cross bar switches. This results in a highly reconfigurable array but requires a large die area, a long time to accomplish the reconfiguring and a large number of buses. Yet another reconfigurable approach, the PACT, XPP, parallel processes the data by using processing array elements (PAE) where each of the PAE's uses an ALU and a few registers but is limited to only ALU types of operations without multiplication. In still another approach, using a multiple instruction multiple data (MIMD) array, each cell includes a full digital signal processor (DSP) that can change from a simple DSP up to a VLIW type; each DSP includes data address registers (DAGs), a compute block comprising at least one ALU, a shifter and a 16-bit multiplier with a register file, an instruction decoder and a sequencer. These systems are fast and versatile but they require very high power and very large die area. See the reconfigurable ALU array (RAA) at www.elixent.com. See also the XPP architecture at www.PACTCORP.com.