This invention relates generally to instruction execution in a data processing system, and more specifically to a data processing system having a scalar execution mode and a vector execution mode, where the vector execution mode includes a true vector mode for processing highly vectorizable loops and a pseudo-vector mode for processing loops that are difficult to vectorize.
Recently much attention has been focused on designing low-cost, low-power and high performance processors for mid-to-low end embedded applications, such as pagers, cellular phones, etc. Many of these, embedded applications require the data processing system to perform highly repetitive functions, such as digital signal processing (DSP) functions, where a large amount of Instruction Level Parallelism (ILP) can be exploited, while also requiring the system to perform control intensive functions.
To address these needs, some systems use dual-core solutions, where one core performs all the control intensive functions, and the other core performs the specialized DSP functions. In this approach, the processor cores communicate with each other through communication channels implemented within the system, such as a shared memory. These systems often employ dual instruction streams, one for each execution core. These dual core systems typically have higher hardware and development costs.
In addition, in many embedded applications, some loops are highly vectorizable, while other loops are more difficult to vectorize. Highly vectorizable loops can be efficiently processed by using the traditional vector processing paradigm, such as those described in xe2x80x9cCray-1 Computer System Hardware Reference Manualxe2x80x9d, Cray Research, Inc., Bloomington, Minn., publication number 2240004, 1977. This is applicable to the vectorizable loops, but does not extend to those loops that are difficult to vectorize.
For loops that are difficult to vectorize, a DSP style of processing paradigm, which focuses on optimizing loop executions will be more suitable. The SHARC product described in the ADSP-2106x SHARC User""s Manual, Analog Devices Inc., 1997, is an example of a system employing loop optimization. While providing efficient performance of loops that are difficult to vectorize, this approach is not as efficient for highly vectorizable loops.
A need exists, therefore, for a low-cost data processing system to efficiently perform both control and repetitive loop functions. Further, a need exists for a low cost, efficient processing system that handles both vector and DSP style processing using a single set of functional units responsive to the type of loop to be processed.