The technical field of this invention is microprocessors and digital signal processors, especially those used for video and audio processing using diverse data sizes.
The adder units of data processors are becoming larger and larger. Most high performance data processors operation on data words of 32 bits but data words of 64 bits are also known in the art. Such wide adder units provide increased parallelism and greater processing speed only if the real data manipulated is of this large width. Using a 64 bit wide adder unit to operate on 8 bit data is no better than using a 32 bit wide adder unit. It is known in the art to employ single instruction multiple data operations (SIMD) using these wide adder units. The adder unit is divided into a number of sections with the carry propagation between sections interrupted. This enables the wide adder unit to perform the same operation on data elements smaller in size packed into the wide data word. This processing takes advantage of the parallelism provided by wide adder units.
There remain some problems with these SIMD techniques. There is currently no easy way to deal with data overflow beyond the separate sections of the divided adder unit. Typical algorithms may require many additions or subtractions on the packed data with no guarantee that the result won""t overflow or underflow. Currently checking for and recovery from overflow and underflow is handled inefficiently in software. Additionally, there is no current technique for SIMD operations on packed data of mixed size.
This invention is a data processing circuit having an adder unit divided into plural sections. Each section, receives a subset of the bits of the operands and generates a subset of the bits of the resultant. A carry multiplexer is disposed between the sections. This carry multiplexer selects one of a plurality of possible carry inputs to the following sections. In a first embodiment, the selection made by the carry input multiplexers is between: a fixed xe2x80x9c0xe2x80x9d; a fixed xe2x80x9c1xe2x80x9d; the carry output of the prior section: or the stored carry output of the same section from a prior operation. Selection from among these carry inputs would enable the adder unit to manipulate packed data in a variety of sizes, even mixed sizes within the same operation.
The data processing circuit may make the specification of the selection of the carry control multiplexers a number of ways. Firstly, the opcode of the instruction may be decoded to specify the carry multiplexer selections. Secondly, a combination of the opcode of the instruction and an opcode modification field may be decoded to specify the carry multiplexer selections. An immediate field of the instruction may directly specify the carry control signals controlling the selection of the carry multiplexers. The instruction could designate one of a plurality of carry control registers which store the carry control signals.
The carry multiplexers may also receive the stored carry outputs of each of the other sections. Proper selection of the stored carry outputs would permit extended precision arithmetic on packed data of a variety of sizes and upon packed data of mixes sizes within the same operation.
In a first the preferred embodiment, the adder unit consists of an arithmetic logic unit divided into sections of equal size. Convenient size selections include 32 total bits divided into 4 sections of 8 bits each and 64 total bits divided into 4 sections of 16 bits each. Alternatively, the arithmetic logic unit may be divided into sections of unequal length. A convenient size selection includes 64 total bits divided into 6 sections of 8 bits each and 4 sections of 4 bits each. The nibble (4 bit) sections are preferably disposed contiguously at either the least significant bits or the most significant bits.