1. Technical Field
This invention relates to computer processing systems, and, in particular, vector multiplication operations performed by computer processing systems.
2. Related Art
Partial multiplication of an integer x-bits wide with another integer y-bits wide generates a result less than (x+y) bits wide, which typically represents the high order half (or low order half) of the full multiplication operation of the two integers. A vector multiplication operation may utilize partial multiplication by defining multiplication primitives that perform a partial multiplication operation on elements of partitioned source vectors to a produce a resultant vector. Prior art implementations have used such partial multiplication operations. For example, the VISTM instruction set extension to the SPARC-V9.TM. architecture developed by SUN Microsystems, Inc. includes a series of vis_fmul8.times.16 instructions that perform a partial multiplication operation on elements of partitioned source vectors to a produce a resultant vector. The elements of partitioned source vectors that are multiplied together vary based upon the particular vis_fmul8.times.16 instruction. A more detailed description of the series of vis_fmul8.times.16 vector multiplication operations that make up the VIS.TM. instruction set is set forth in "VIS.TM. Instruction Set User's Manual", Sun Microsystems, Inc., 1997, pp. 54-64.
Full multiplication of an integer x-bits wide with another integer y-bits wide generates a result (x+y) bits wide. A vector multiplication operation may utilize full multiplication by defining multiplication primitives that perform a full multiplication operation on elements of partitioned source vectors to a produce a resultant vector. Prior art implementations that use such full multiplication operations may be placed into one of two categories: Accumulator-based, and Register-pair Destination.
In an Accumulator-based implementation, the result of the full multiplication operation on the elements of the partitioned source vectors is written to a non-general purpose register, named accumulator, of wider width than the general purpose registers. For example, the Digital Media Extension (MDMX) extension to the MIPS architecture developed by Silicon Graphics Inc. includes an MULA instruction that multiplies together elements of two source vectors and writes the result to a private 192-bit Accumulator register (which cannot be directly loaded from or stored to main memory, but must be staged though a FP register file). A more detailed description of the MULA vector multiplication operation in the MDMX extension set is set forth in "MIPS Digital Media Extension", Silicon Graphics, Inc., pp. C-18.
In a Register-pair Destination implementation, the result of the full multiplication operation on the elements of the partitioned source vectors is written to a pair of general purpose registers. For example, the instruction set architecture of the broadband processor developed by MicroUnity Systems Engineering, Inc. includes a g.mult.32 instruction that multiplies together the corresponding symbols in two 64-bit registers and writes the result to two 64-bit registers. A more detailed description of the g.mult.32 instruction is set forth in "Architecture of a Boradband MediaProcessor", MIPS Digital Media Extension", MicroUnity Systems Engineering, Inc., 1996, which was presented at COMPSCON96, Feb. 25-29, 1996.
There are significant limitations that pertain to each of the prior art implementations discussed above in performing a full multiplication operation on elements of two or more source vectors. First, implementations that perform vector multiplication operation utilizing partial multiplication require significant computational overhead to piece together the partial multiplication results to generate the results of the full multiplication operation. Second, the Accumulator-based implementations are inflexible due to the fact that there is a very limited number (typically one or two) non-general purpose accumulator registers that may be used to store the results of the vector multiplication operation, which restricts the number of vector multiplication operations that can be concurrently performed by the processor. Finally, Register-pair Destination implementations complicate the run-time dispatch operation (i.e., register renaming operation) of instructions due to the fact that such implementation write the results of the vector multiplication operations to two or more registers.
Thus, there is a need in the art to provide an efficient and flexible mechanism for performing a full multiplication operation on the elements of two or more source vectors.