1. Field of the Invention
The invention relates generally to methods and apparatus for performing both scalar and vector operations in a digital computing system. More particularly, the invention relates to methods and apparatus for integrating a vector operation capability into the execution unit (sometimes referred to herein as an "E-Unit") portion of a computing system's central processing unit ("CPU"). Such units (CPUs and E-Units) are typically designed to support the processing of scalar instructions in a single instruction single data ("SISD") and/or multiple instruction single data ("MISD") mode of operation. Special purpose vector processing facilities, usually in the form of a coprocessor attached to the CPU, are required to process vector instructions in a single instruction multiple data ("SIMD") mode of operation.
One specific aspect of the invention relates to an improved CPU that includes an E-Unit having a vector processing system integrated therein (in addition to normal scalar instruction processing capabilities commonly supported by such units).
According to this aspect of the invention, the vector processing capability may be integrated into an E-Unit designed to support scalar instruction processing by (a) pipelining the fixed point and floating point instruction functional units (in the E-Unit), that are required to implement the vector instruction set; (b) adding a set of vector registers to, the architected data registers contained in the E-Unit; (c) modifying E-Unit control logic to queue and schedule both vector and scalar instructions; and (d) enhancing the E-Unit's load and store unit bandwidth capability to support the transfer of the contiguous blocks of data normally associated with vector processing.
Further aspects of the invention relate to improved methods (for use in general purpose digital computing systems), for processing a stored program that includes both vector and scalar type instructions; and an execution unit per se (something less than an entire CPU), built in accordance with the teachings of the invention, that supports both vector and scalar data processing using single data and instruction paths.
2. Description of the Prior Art
In commercially available computing systems, such as the ESA 9000 model 9021-900 manufactured by IBM ("IBM" is a registered trademark owned by the International Business Machines Corporation), a vector processing facility in the form of a coprocessor may be attached to the CPU to process vector instructions. The ESA 9000 is fully described in published reference documents including the "ES/9000 Reference Guide" (Manual No. G3209996), published by IBM, hereby incorporated by reference.
The E-Unit portion of the CPU in an ESA 9000 (and most commercially available systems), is responsible for processing scalar instructions while the vector processing facility (the coprocessor)l is independently responsible for processing vector instructions using separate instruction and data paths.
Utilizing techniques that are well known to those skilled in the art, an instruction unit ("I-Unit") portion of a CPU (like the ESA 9000 CPU), fetches instructions from memory, decodes the instructions and whenever a vector instruction is decoded, sends the instruction (or more precisely an image of the instruction including op code and information concerning where in storage to get operands from), to the vector coprocessor. Scalar instruction images are passed directly to the aforementioned E-Unit for processing.
Many of the same functions are performed by the vector coprocessor and the E-Unit portion of the CPU once an instruction and relevant data are presented for processing.
In particular, each facility has (in a data path); (a) a load unit used to move data from storage to either a data register or a functional unit such as an adder, multiplier, etc.; (b) an "architected" data register pool, that is a set of storage elements with a prescribed use that can only be manipulated through the use of instructions (such as general purposes registers, arithmetic registers, control registers, etc., and in the case of the vector coprocessor an additional set of storage elements known as vector registers); (c) various arithmetic and logical functional units for either SISD of MISD operations (for E-Units), and functional units that support SIMD operations (for vector coprocessors); (d) internal working registers (within each functional unit) to provide the required working storage for the particular function performed by a given functional unit; and (e) a store unit used to move data to storage from either a data register or the functional units.
Additionally, each facility has (in an instruction path) control means for queuing instructions and scheduling instruction execution by an appropriate functional unit.
Since many of the aforementioned components in a vector coprocessor and an E-Unit are duplicated, it would be desirable to be able to integrate the vector facility into the E-Unit to reduce hardware costs, processing time and to reduce the number of instruction and data paths required to support the processing of both vector and scalar instructions.
Prior art systems are known in which portions of the aforementioned CPU functions and coprocessor functions are combined; however no system is known where vector and scalar instruction processing is integrated in a single E-Unit having a single instruction path and a single data path.
The prior art includes many examples of vector coprocessors per se, improvements made to these processors to improve throughput, the bandwidth between such processors and storage, and the sharing of certain logic components to synchronize the operation of separate vector and scalar processors.
For example, U.S. Pat. No. 4,780,811, to Aoyama et al., entitled "Vector Processing Apparatus Providing Vector And Scalar Processor Synchronization", describes vector processor apparatus that includes a scalar processor for executing scalar instructions and a separate vector processor for processing vector instructions. This reference teaches the use of a common status register (for detecting instruction completion) that can be accessed by both processors. However, the vector and scalar processing functions themselves are not integrated; i.e., separate logic is still required for each of the processors per se.
In U.S. Pat. No. 5,053,987, to Genusov et al., entitled, "Arithmetic Unit In A Vector Signal Processor Using Pipelined Computational Blocks", an arithmetic unit is taught for a vector signal processor implementing the IEEE Standard 754 for Floating-Point Arithmetic. The arithmetic unit includes three pipelined floating-point computational blocks for high computation throughput.
Although describing a pipelined functional unit included in a vector processor (which significantly improves the performance of a vector processor) Genusov et al, does not teach, claim or even suggest the integration of such a block into an E-Unit scalar processor. In fact, the Genusov et al, unit is shown as being a coprocessor type vector processing unit, capable of being coupled (via buses 20 and 22), to a separate scalar processor.
U.S. Pat. No. 4,967,343, to Ngai et al., entitled "Pipelined Parallel Vector Processor Including Parallel Configured Element Processors For Processing Vector Elements In Parallel Fashion", describes a pipelined parallel vector processor in which the vector registers are subdivided into a plurality of smaller registers to facilitate parallel processing and greater throughput. Thus, the Ngai et al. reference is but another example of a pipelined parallel processor used as a coprocessor in association with an E-Unit for supporting scalar instruction processing.
Still other examples of a coprocessor type vector processor that may be coupled to a CPU are described in U.S. Pat. No. 5,038,312, to Kojima, entitled "Data Processing System Capable Of Performing Vector/Matrix Processing And Arithmetic Processing Unit Incorporated Therein", and in U.S. Pat. No. 5,029,969, to Izumisawa et al., entitled "Computer System For Directly Transferring Vector Elements From Register To Register Using A Single Instruction".
U.S. Pat. No. 5,008,812, to Bhandarkar et al., entitled "Context Switching Method And Apparatus For Use In A Vector Processing System", describes a a data processing system that includes instruction decoding means for routing vector instructions to vector processing means and scalar instructions to separate scalar processing means. The processor described has a single instruction unit; but still has two separate execution units.
U.S. Pat. No. 5,073,970, to Aoyama et al, entitled "Vector Processing Apparatus Allowing Succeeding Vector Instruction Chain Processing Upon Completion Of Decoding Of A Preceding Vector Instruction Chain", describes a vector processing apparatus that includes separate vector and scalar processing apparatus (further including separate instruction decoders). The reference does not, however, teach, claim or even suggest integrating the scalar and vector processing functions in a single E-Unit.
Another example of a pipelined functional unit, in particular a pipelined floating point adder, is described in U.S. Pat. No. 4,994,996, to Fossum et al., entitled "Pipelined Floating Point Adder For Digital Computer". The adder taught in the reference is used in present day vector processing facilities; not in an E-Unit having an integrated vector processing capability.
Other references which may be used to exemplify the present state of the art are U.S. Pat. No. 4,949,247, to Stephenson et al., entitled "System For Transferring Multiple Vector Data Elements To And From Vector Memory In A Single Operation", which describes a vector register implementation that provides high bandwidth; and U.S. Pat. No. 4,928,238, to Sekiguchi, entitled "Scalar Data Arithmetic Control System For Vector Arithmetic Processor", which describes an improved vector processor; not an integrated scalar and vector instruction processor.
In view of the present state of the art as exemplified by the aforementioned commercially available system and systems described in the references set forth hereinabove, it would be desirable to be able to provide (a) methods and apparatus which facilitate the integration of a vector facility into an E-Unit; (b) methods and apparatus for integrating a vector operation capability into an E-Unit to support the processing of scalar instructions in a single instruction single data ("SISD") and/or multiple instruction single data ("MISD") mode of operation, together with supporting a single instruction multiple data ("SIMD") mode (vector mode) of operation using the same hardware and a single instruction path and a single data path; (c) an improved CPU that includes an E-Unit having a vector processing system integrated therein (in addition to normal scalar instruction processing capabilities commonly supported by such units); (d) improved methods (for use in general purpose digital computing systems), for processing a stored program that includes both vector and scalar type instructions; and (e) an improved execution unit per se (something less than an entire CPU), built in accordance with the teachings of the invention, that supports both vector and scalar data processing using single data and instruction paths.