1. Technical Field
The present invention relates in general to data processing and, in particular, to a processor and method for processing vector instructions. Still more particularly, the present invention relates to a processor and data processing method in which vector instructions operating on vector elements of differing lengths are executed with significant hardware reuse.
2. Description of the Related Art
Traditionally, many computer systems capable of performing numerically-intensive applications followed one of two architectures. According to a first architecture, a computer system includes a central processing unit (CPU) for performing system-control functions and one or more numerical processing circuits, for example, Digital Signal Processor (DSPs), math co-processors, Application Specific Integrated Circuits (ASICs) or the like, for performing specialized computations. Because of the use of both a general-purpose CPU and specialized numerical processing circuitry, this architecture can be useful in a broad range of applications in addition to just numerically-intensive applications. However, the inclusion of both specialized computational circuitry and a general-purpose CPU within a computer system introduces significant complexity in that multiple diverse instruction and data streams must be concurrently supported, as well as significant communication between the specialized computational circuitry and the general-purpose CPU.
According to a second architecture, a computer system is implemented as a vector processor having tens or hundreds of identical Arithmetic Logic Units (ALUs) for processing multiple variable-length vectors in parallel. That is, each ALU processes a different one-dimensional vector in a pipelined fashion, and all ALUs operate concurrently. This second architecture, while specifically tailored to scientific computing and thus avoiding some of the complexity of the first architecture, is not optimal for performing a broad range of non-numerically intensive applications.
In addition to these architectures, a third architecture, exemplified by the PowerPC.TM. Reduced Instruction Set Computing (RISC) architecture, has emerged. According to the PowerPC.TM. RISC architecture, a single-chip general-purpose microprocessor is equipped with multiple execution units, including separate execution units for performing integer and floating point operations, that execute in parallel on a single instruction stream. This superscalar architecture has the advantage of being able to efficiently execute numerically-intensive applications, which typically contain a large percentage of floating point operations, as well as other types of applications, which tend to contain fewer floating-point operations than integer operations. The PowerPC.TM. RISC architecture is described in numerous publications, including PowerPC Microprocessor Family: The Programming Environments, Rev 1 (MPCFPE/AD) and PowerPC 604.TM. RISC Microprocessor User's Manual (MPC604UM/AD), which are incorporated herein by reference.
In accordance with the present invention, the computational capabilities of the PowerPC.TM. architecture have been expanded by the inclusion of an additional vector execution unit that operates concurrently with the other execution units on a single instruction stream. In contrast to the vector processing architecture described above, the vector execution unit within the PowerPC.TM. architecture can concurrently process all elements of one-dimensional fixed-length vector operands in parallel rather than one element at a time. The addition of vector processing capability to the general-purpose PowerPC.TM. architecture further accelerates its performance when executing numerically-intensive software applications.