The present invention relates to a data processing system and method for performing an arithmetic operation on a plurality of signed data values, and in particular to techniques which avoid the requirement to apply the arithmetic operation individually to each signed data value.
To enable arithmetic operations to be applied simultaneously to a number of data values. Single Instruction Multiple Data (SIMD) instructions have been developed, where a single instruction is applied to a composite data value consisting of a number of fields, with each field containing a separate data value.
To support SIMD instructions, it is necessary to provide specific SIMD hardware to ensure that the data values in each field of the composite data value do not interact with each other during the application of the SIMD operation to the composite data value. For example, SIMD extensions like Intel Corporation""s MMX hardware and the SA-1500 coprocessor generally allow a wide register to be split into independent fixed size sub-fields. For example, such registers may be 64 bits wide, and hence may contain eight 8-bit values, four 16-bit values or two 32-bit values. SIMD instructions operating on these registers are usually 3-operand instructions of the type A=B top C, but they operate on each of the sub-fields in parallel.
For example, an addition of two 64-bit SIMD registers containing four 16-bit values results in four additions being performed on the four sub-fields of each register. The 64-bit result contains four 16-bit values which are the sum of the corresponding fields of the inputs.
The SIMD approach is especially useful in cases where a set of operations on data, for example 8 to 16-bit data, must be performed many times on large quantities of data. Considering the example where 64-bit registers are used, by employing SIMD instructions, up to eight independent 8-bit data sets can be processed in parallel, achieving significant processing speed improvements. One particular area where such SIMD instructions are useful is the area of IPEG or MPEG compression and/or decompression, where many Discrete Cosine Transformation (DCT) operations need to be performed. Each DCT consists of a series of additions and multiplications performed on signed data, and it has been found that the use of SIMD instructions can significantly improve processing speed.
However, there are a number of disadvantages resulting from the use of SIMD instructions. Firstly, as mentioned earlier, to enable SIMD instructions to operate correctly, special hardware is required to ensure that the data values in the various subfields of the register remain decoupled from each other as the SIMD operation is applied. Further, new SIMD instructions that use this hardware need to be defined. This tends to lead to an increase in the instruction bit space required to identify instructions, which is undesirable.
Viewed from a first aspect, the present invention provides a method of operating a data processing system to perform an arithmetic operation on a plurality xe2x80x98pxe2x80x99 of signed xe2x80x98n-bitxe2x80x99 data values, comprising the steps of encoding the plurality of signed n-bit data values as a composite value comprising p n-bit fields by performing an encoding operation equivalent to aligning each signed data value with a respective n-bit field, sign extending each signal data value to the most significant bit of the composite value, and adding the aligned and sign extended data values to form the composite value; applying the arithmetic operation to the composite valve to produce an encoded result comprising a n-bit fields; and decoding the encoded result to produce p final results by applying a decoding operation equivalent to extracting the data from each n-bit field of the encoded result and correcting for any effect caused by the addition of an adjacent sign extended data value during the encoding step; whereby each final result represents the application of the arithmetic operation to a corresponding signed n-bit data value.
In accordance with the present invention, two or more signed data values are encoded into a composite value, and the composite value is then processed using standard instructions as if they were SIMD instructions. This is made possible by use of a particular encoding of the signed data values which avoids the problems with overflow and underflow between two adjacent data values. Since this encoding enables standard instructions to be used, much of the benefits of using SIMD extensions can be achieved without the cost of extra hardware or now instructions which SIMD techniques necessitate.
In accordance with the present invention, a plurality p of signed n-bit data values are encoded as a composite value comprising p n-bit fields by performing an encoding operation equivalent to aligning each signed data value with a respective n-bit field, sign extending each signed data value to the most significant bit of the composite value, and adding the aligned and sign extended data values to form the composite value. This encoding allows many operations, for example addition, subtraction, multiplication by a constant and left shifting operations, to be used as long as each of the packed values remains within the minimum/maximum range (for example a value in a 16-bit field must remain within the range xe2x88x9232768 to +32767).
In accordance with the present invention, the encoding is applied to signed data values. Since the encoding of the present invention involves applying an operation equivalent to sign extending each signed data value to the most significant bit of the composite value, it is clear that the addition of the various signed data values aligned with their respective n-bit fields of the composite value will potentially result in some interaction between the various data values. For example, in preferred embodiments, by sign extending a particular signed data value, there will be no effect on the other data values if that sign extended data value is positive, but if that sign extended data value is negative, then this will have the effect of subtracting from the composite value the value 1 aligned with the adjacent data value representing the next n higher significant bits of the composite value.
Accordingly, due to this potential interaction, it is entirely counter intuitive to apply this encoding, since it would be expected that the application of the arithmetic operation to the composite value resulting from such encoding would no yield a result from which could be derived the individual results that would of arisen from the application of the arithmetic operation to each signed n-bit data value in turn. However, contrary to expectations, it has been found that by using the encoding technique of the present invention, the application of the arithmetic operation does yield an encoded result from which the individual final results can be readily derived. All that is required is to extract the data from each n-bit field of the encoded result and to correct for any effect caused by the addition of an adjacent sign extended data value during the encoding step. Hence, in preferred embodiment, all that is required is to add back to the encoded result the value 1 aligned with a particular n-bit field of the encoded result, if the data in the adjacent n-bit field representing the adjacent n lower significant bits of the encoded result is negative.
It will be appreciated that there are many different ways in which the encoding operation can be implemented, provided that the implementation chosen yields a composite value which is equivalent to aligning each signal data value with a respective n-bit field, sign extending each signed data value to the most significant but of the composite value, and adding the aligned and signed extended data values to form the composite value. For example, it is not necessary to align each signed data value with a respective n-bit field prior to sign extending each signed data value, and alternatively each signed data value can first be sign extended prior to any alignment process taking place.
Further, in one embodiment of the present invention, the encoding operation comprises the steps of allocating said plurality of n-bit data values to respective n-bit fields of an intermediate value, where the first n-bit field comprises the n lowest significant bits of the intermediate value and the p-th n-bit field comprises the n highest significant bits of the intermediate value; for the p-1-th to the first n-bit field, beginning with the p-1-th n-bit field, determining whether the data value in that n-bit field is negative, and if so, generating a new intermediate value by subtracting from the intermediate value a logic 1 value aligned with the adjacent n-bit field representing a higher significant bits; such that when the first n-bit field has been processed, said intermediate value is said composite value.
Similarly, it will be appreciated that the decoding operation can be implemented in a number of different ways, provided that it produces final results that are equivalent to extracting the data from each n-bit field of the encoded result and correcting for any effect caused by the addition of an adjacent sign extended data value during the encoding step. For example, in a preferred embodiment of the present invention, a first n-bit field of the encoded result comprises the n lowest significant bits of the encoded result and a p-th n-bit field of the encoded result comprises the n highest significant bits of the encoded result, and the decoding operation comprises the steps of: for the first n-bit field to the p-1-th n-bit field, starting with the first n-bit field, determining if the data in that n-bit field is negative, and if so, generating a new encoded result by adding to the encoded result a logic 1 value aligned with the adjacent n-bit field representing a higher significant bits; such that when the p-1-th n-bit field has been processed, each n-bit field contains one of said p final results.
Whilst considering the encoding operation and the decoding operation, it will also be appreciated by those skilled in the art that, dependent on the implementation chosen, the encoding operation and/or the decoding operation can consist of one or more instructions.
In preferred embodiments, the arithmetic operation is a function of one or more composite values, each composite value encoding a plurality of signed n-bit data values, and each composite value being generated by applying said encoding step to the corresponding plurality of signed n-bit data values. Hence, if it is desired to perform the addition A=A1+A2 and the addition B=B1+B2, then, in accordance with preferred embodiments of the invention, two composite values C1 and C2 would be produced, where C1 is an encoding of A1 and B1 and C2 is an encoding A2 and B2, and the arithmetic operation would then perform an addition of the two composite values C1 and C2 to produce the result C=C1+C2. The results A and B would then be derived by decoding the result C of the arithmetic operation. The above operation is illustrated for simplicity, but it will be appreciated by those skilled in the art that more than two data values may be involved in the generation of a composite value, and the arithmetic operation may be applied to more than two composite values. Further, it will be appreciated that the arithmetic operation may in fact comprise a plurality of operations.
In preferred embodiments, the signed data values are in 2-s complement format.
It will be appreciated that many different arithmetic operations may be applied to the composite values resulting from the encoding technique of the present invention. However, in one embodiment of the present invention, the arithmetic operation comprises one or more discrete cosine transformation (DCT) operations, each DCT operation being a function of one or more composite values, each composite value encoding a plurality of signed n-bit data values, and each composite value being generated by applying said encoding step to the corresponding plurality of signed n-bit data values. One particular area where DCT operations are applied is that of JPEG of MPEG compression or decompressions, and it has been found that the techniques in accordance with preferred embodiments of the present invention are particularly advantageous when performing JPEG or MPEG compression or decompression.
Viewed from a second aspect, the present invention provides a data processing system for performing an arithmetic operation on a plurality xe2x80x98pxe2x80x99 of signed xe2x80x98n-bitxe2x80x99 data values, comprising: a processor for applying the arithmetic operation; a storage for storing the plurality of signed n-bit data values; the processor being arranged, prior to execution of the arithmetic operation, to retrieve the plurality of signed n-bit data values from the storage, and to encode the plurality of signed n-bit data values as a composite value comprising p n-bit fields by executing an encoding operation equivalent to aligning each signed data value with a respective n-bit field, sign extending each signed data value to the most significant bit of composite value, and adding the aligned and sign extended data values to form the composite value, the composite value being stored in the storage, the processor being arranged to apply the arithmetic operation to the composite value to produce an encoded result comprising p n-bit fields, and to store the encoded result in the storage, and the processor further being arranged, subsequent to application of the arithmetic operation, to decode the encoded result to produce p final results by executing a decoding operation equivalent to extracting the data from each n-bit field of the encoded result and correcting for any effect caused by the addition of an adjacent sign extended data value during the encoding step; whereby each final result represents the application of the arithmetic operation to a corresponding signed n-bit data value.
In preferred embodiments, the processor includes a shifter and an arithmetic logic unit (ALU) arranged to execute the encoding and decoding operations. These are standard hardware elements within a typical processor, and hence no special hardware is required within the processor to enable it to handle the encoding and decoding operations. However, if desired, specific hardware could be provided to handle the encoding and or the decoding operations.
The storage used to store the signed n-bit data values, the composite value, and the encoded result, may take a variety of forms, for example a memory, or a register bank, and indeed different physical storage elements may be used to store the signed n-bit data values, the composite value, and the encoded result. However, in preferred embodiments, the storage is a register bank for storing data values used by the processor. Hence, prior to execution of the encoding operation, the relevant signed n-bit data values will be read into the register bank from memory, and then the encoding operation, arithmetic operation and decoding operation will take place by appropriate manipulation of the data values in the register bank.
Viewed from a third aspect, the present invention provides an encoding/decoding manager for a data processing system arranged to perform an arithmetic operation on a plurality xe2x80x98pxe2x80x99 of signed xe2x80x98n-bitxe2x80x99 data values, the encoding/decoding manager being arranged to encode the plurality of signed xe2x80x98n-bitxe2x80x99 data values as a composite value prior to application of the arithmetic operation by the data processing system to generate an encoded result, and to subsequently decode the encoded result to produce p final results, the encoding/decoding manager comprising: an encoder configured in operation to encode the plurality of signed n-bit data values as said composite value comprising p n-bit fields by performing an encoding operation equivalent to aligning each signed data value with a respective n-bit field, sign extending each signed data value to the most significant bit of the composite value, and adding the aligned and sign extended data values to form the composite value, and a decoder configured in operation to produce said p final results by applying a decoding operation equivalent to extracting the data from each n-bit field of the encoded result and correcting for any effect caused by the addition of an adjacent sign extended data value during the encoding step: whereby each final result represents the application of the arithmetic operation to a corresponding signed n-bit data value.
Viewed from a fourth aspect, the present invention provides a computer program product on a computer readable memory for operating a data processing system to encode a plurality xe2x80x98pxe2x80x99 of signed xe2x80x98n-bitxe2x80x99 data values as a composite value prior to application of an arithmetic operation to generate an encoded result, and to subsequently decode the encoded result to produce p final results, the computer program product comprising: an encoder configured in operation to encode the plurality of signed n-bit data values of said composite value comprising p n-bit fields by performing an encoding operation equivalent to aligning each signed data value with a respective n-bit field, sign extending each signed data value to the most significant bit of the composite value, and adding the aligned and sign extended data values to form the composite value; and a decoder configured in operation to produce said p final results by applying a decoding operation equivalent to extracting the data from each n-bit field of the encoded result and correcting for any effect caused by the addition of an adjacent sign extended data value during the encoding step, whereby each final result represents the application of the arithmetic operation to a corresponding signed n-bit data value.