1. Technical Field
The present invention relates in general to consolidation of multimedia facilities and in particular to reusing existing circuitry for one multimedia instruction in place of comparable circuitry for other multimedia instructions. Still more particularly, the present invention relates to employing a crossbar within a vector permute unit for wide shifting functions required for other multimedia instructions.
2. Description of the Related Art
Multimedia applications are increasing, leading to an increased demand for multimedia facilities within processors. Processors, such as the PowerPC.TM. processor available from IBM Corporation of Armonk, New York, are increasingly incorporating such multimedia facilities. In the case of the PowerPC.TM., the multimedia facility is the vector multimedia extensions (VMX) facility.
Several of the instructions implemented by the VMX facility require a multiplexing function for at least one stage. For example, the traditional approach to implementing the vpack instruction, which compresses either 32 bits into 16 bits or 16 bits into 8 bits, would involve a multiplexer. An example is depicted in FIG. 3. A vpack instruction is received by decode logic 302, which generates selects for multiplexer 304 based on whether the operand 306 is being compressed from 16 bits to 8 bits or from 32 bits to 16 bits. Multiplexer 304 selects possible alternatives for the top target byte 308a from the bytes of 32 bit operand 306. Saturation multiplexers 310a and 310b, under the control of saturation detection logic 312, select between source bytes from operand 306 or their saturated values 314a and 314b for target bytes 308a and 308b. Multiplexer 304, in particular, requires a significant amount of area within the multimedia facility and may incur undesirable latency in instruction execution.
Other instructions supported by a typical multimedia facility within a processor architecture require other, specialized hardware. Shift operations, for example are traditionally performed utilizing a barrel shifter. However, this approach becomes more expensive as the shifts become wider. The ability to perform wide shifts (64 to 128 bits) is useful in performing multimedia operations, but is very expensive in hardware implementations. The area needed for a barrel shifter performing wide shifts, and the latency incurred by such a shifter, may become unacceptable.
It would be desirable, therefore, to utilize existing hardware within the multimedia facilities of a processor to performing comparable multiplexing and shifting functions for other instructions. It would further be advantageous if the resulting mechanism reduced latencies for the instructions.