The present invention relates in general to data processing systems, and in particular, to vector arithmetic operations in a data processor.
Vector processing extensions to microprocessor architectures are being implemented to enhance microprocessor performance, particularly with respect to multimedia applications. One such vector processing extension is the Vector Multimedia Extension (VMX) to the Power PC microprocessor architecture (xe2x80x9cPower PCxe2x80x9d is a trademark of IBM Corporation.) VMX is a single instruction multiple data (SIMD) architecture. In a SIMD architecture, a single instruction operates on multiple sets of operands. For example, an instruction having thirty-two bit operands may operate on the operands in bytewise fashion as four eight-bit operands, as sixteen bit half-word operands, or as word length operands of thirty-two bits.
Integer arithmetic instructions may have both modulo, that is, wrap around, and saturating modes. The mode determines the result of the operation implemented by the instruction when the result overflows the result field, either a byte-length field, a half-word-length field, or a word length field, depending on the data type being operated on by the instruction. In modulo mode, the result truncates an overflow or underflow for the length (byte, half-word, or word) and type of operand (signed or unsigned). In saturating mode, the result is clamped to its saturated value, the smallest or largest representable value in the field.
To implement these instructions, three tasks need to be performed. An intermediate result is produced, using a single adder, which may be embodied in an arithmetic unit, in accordance with the specific instruction being executed. It is then determined if the intermediate result fits into the field corresponding to the length of the operand. Then, the appropriate result must be selected, either the intermediate result, the truncated overflow or underflow, if in modulus mode, or the saturation value, if in saturating mode.
The task of determining if an intermediate result fits into its field, and the task of selecting the appropriate value as a final result may be complicated and time consuming. In particular, these tasks are complicated in that the instructions support different data types, that is, subvector operands having different lengths, as described hereinabove, each of which may be either signed or unsigned. Consequently, it becomes difficult to meet cycle time requirements if the three tasks are performed sequentially.
Thus, there is a need in the art for apparatus and methods for implementing vector integer arithmetic instructions, which are sufficiently fast to meet cycle time requirements. In particular, there is a need in the art for performing, in parallel, the tasks of generating an intermediate result, determining if the intermediate results fits into a preselected field, and selecting a mode dependent result value.
The aforementioned needs are addressed by the present invention. Accordingly, there is provided, in a first form, a saturation detection apparatus. The saturation detection apparatus includes a saturation detection unit having first and second operand inputs operable for receiving first and second vector operands, the saturation detection unit operable for receiving an instruction signal, the saturation detection unit outputting a plurality of selection signals in response to the first and second operands and the instruction signal. Selection circuitry is also included that is operable for receiving the plurality of selection signals, a vector arithmetic result signal, a plurality of saturation value signals, wherein the selection circuitry selects one of the result signal and the plurality of the saturation value signals in response to the plurality of selection signals.
There is also provided, in a second form, a method of saturation detection. The method generates a set of first signals in response to an executing instruction, and generates a set of second signals in response to first and second carry-out signals and the set of first signals. The method also includes selecting for outputting one of a set of output signals including a result signal and a predetermined set of saturation value signals in response to the set of second signals, wherein the first and second carry-out signals are generated in response to a pair of subvector operands, and the result signal is generated in response to the executing instruction.
Additionally, there is provided, in a third form, a data processing system including a central processing unit (CPU), and a memory operable for communicating instructions and operand data to the CPU, in which the CPU further includes instruction decode circuitry operable for receiving the instructions, an arithmetic unit operable for receiving the operand data, and outputting a result signal in response to the operand data, and an instruction, and saturation detection circuitry, coupled to the memory. The saturation detection unit is operable for receiving the operand data from the memory, the saturation detection circuitry also being operable for selecting a one of a plurality of output signals, wherein the plurality of output signals includes the result signal and a preselected set of saturation signals, the saturation detection circuitry selecting the one of the plurality in response to the operand data and an instruction signal from the decode circuitry.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.