1. Field of the Invention
This invention relates to data processing. More particularly, this invention relates to data processing systems in which it is desired to perform parallel data processing up on a plurality of data values within a data word.
2. Description of the Prior Art
As data processing systems have developed, data path widths have generally become greater. This has led to the increased possibility that data values which it is desired to process may be much narrower in bit width than the data paths available through the processing hardware. As an example, if the processing hardware provides for 32-bit data processing operations to be performed, but the data values being processed are only 8-bit data values, then it is disadvantageously inefficient to separately process the 8-bit data values upon the much more capable 32-bit data paths.
A known technique for making better use of the data processing resources available in the above circumstances is xe2x80x9csingle instruction multiple dataxe2x80x9d instructions. These special purpose instructions effectively allow multiple data values to be embedded within a data word passing along the data paths of the system with processing operations being performed in parallel upon the plurality of data values embedded within each data word. The instructions control the hardware in a manner that ensures that the results of the processing of one data value are not allowed to interfere with the results of the processing of another data value, e.g. the carry chain of an adder is interrupted at positions between the data values such that a carry from the processing of one data value does not propagate into a neighbouring data value.
Whilst the provision of single instruction multiple data instructions does allow advantageous parallel processing of data values within a single data word, it suffers from the disadvantage that it occupies bit space within the instruction bit space of the data processing apparatus concerned and requires the provision of extra circuitry. Instruction bit space is a valuable resource within a data processing system architecture and increased circuit requirements increase cost, size, power consumption etc. A further disadvantage of the single instruction multiple data instruction approach is that the divisions between data values within a data word are determined by the hardware of the system which gives reduced flexibility in the way the system may be used, e.g. the hardware may assume that the data values are 16-bit data values with two data values being stored within a 32-bit data word, whereas a particular processing requirement might be to handle 8-bit data values, which make relatively inefficient use of a 16-bit data channel provided for them within the single instruction multiple data arrangement.
A further feature of many data processing systems is that data values to be processed in parallel are packed together within the memory of the data processing system in an abutting manner. Accordingly, if the data values to be processed are 8-bit byte values, then these will typically be stored as adjacent data values within a memory system with a plurality of these 8-bit byte values being read simultaneously as, for example, a 32-bit word from the memory system. In these circumstances, if it is desired to separately process the data values, then they must be unpacked from the data word in which they were all read, separately processed, and then repacked within a result data word prior to being stored back to the memory. The processing overhead of the unpacking and re-packing is disadvantageous.
Furthermore, the need to conduct such packing and re-packing and the inefficiency of separately processing data values frequently arises in circumstances, such as video data processing, which are already demanding considerable processing resources and so can ill afford the extra processing requirements.
It is known from the field of binary coded decimal (BCD) arithmetic to represent a decimal number by a collection of adjacent 4-bit codes within a word, each 4-bit code representing a decimal digit. In order to make adjacent decimal digits interact during, for example, an add, it is known to add six to each digit prior to the add and then subtract six from each digit after the add.
Viewed from one aspect the present invention provides a method of processing an input data word containing a plurality of abutting input data values, said method comprising the steps of:
(i) performing one or more data processing operations upon said input data word and a further data word to generate an intermediate result data word containing a plurality of abutting intermediate result data values dependent upon said input data values and corresponding portions of said further data word, said one or more data processing operations being such that a corrupting result bit from a first result data value may extend into and change a value of a second result data value;
(ii) calculating an error correcting data word in dependence upon said input data word and said further data word, said error correcting data word having a value that represents any corrupting result bits that may be generated by said step of performing;
(iii) combining said intermediate result data word and said error correcting data word to remove any change of value produced by a corrupting result bit and to generate an output data word, said output data word containing a plurality of abutting output data values being those that would be generated if said one or more data processing operations were performed upon said plurality input data values and said corresponding portions of said further data word in isolation from one another.
The invention recognises that the interactions between result data values that can corrupt one another may be identified. When these interactions have been so identified, their effect may be reversed by an appropriate additional processing step. Thus, an output data word may be produced containing output data values identical to those that would be produced if those output data values had been calculated from the input data values and respective corresponding portions of the further data word in isolation from each other, e.g. without any undesired interaction or corruption. Surprisingly, the extra work of identifying and then compensating for the undesired interactions is more than outweighed by the increase in processing efficiency yielded by being able to process multiple data values within a single data word simultaneously.
Whilst the invention could be applied to a variety of different data processing operations to be performed upon the input data word, it is particularly well suited to situations in which the one or more data processing operations include an addition operation. In these circumstances, the potentially corrupting interactions between data values can be efficiently identified and reversed.
Preferred embodiments of the invention are ones in which an addition operation takes place and the potential corruption being compensated for is where the lowest order bit of a first result data value undesirably changes the value of the highest order bit of a second result data value. In many real-life data processing situations, the high order bits are of more practical significance than the low order bits of results and so a low order bit may already effectively be being discarded.
The above considerations also apply in the case of a subtraction operation.
As a preferred example of the way in which the error correcting data word may be calculated, an exclusive OR operation may be performed between two data words to identify the potential corrupting result bit at each position.
The identification of potential corrupting result bits can be focused upon the boundaries between data values by a logical AND operation using a mask value that picks out bits at the data value boundaries.
It has been found that a rounding step may be advantageously combined with the error correcting process by either adding or subtracting an error correcting data word in accordance with the desired rounding mode.
The technique of the present invention could be applied in many different circumstances, but it is particularly suited to implementations in which the data being processed corresponds to adjacent signal values within a stream of signal values, such as adjacent pixel values. These situations require large volumes of data to be processed and so processing efficiency gains are highly significant.
Whilst the input data values could have a restricted range within their bit width, the chance of undesirable interactions between adjacent data values, and accordingly the worth of the invention, is greater in embodiments in which the data values extend over the full range of values allowed by their bit widths.
Viewed from another aspect the present invention provides an apparatus for processing an input data word containing a plurality of abutting input data values, said apparatus comprising:
(i) processing logic operable to perform one or more data processing operations upon said input data word and a further data word to generate an intermediate result data word containing a plurality of abutting intermediate result data values dependent upon said input data values and corresponding portions of said further data word, said one or more data processing operations being such that a corrupting result bit from a first result data value may extend into and change a value of a second result data value;
(ii) calculating logic operable to calculate an error correcting data word in dependence upon said input data word and said further data word, said error correcting data word having a value that represents any corrupting result bits that may be generated by said step of performing;
(iii) combining logic operable to combine said intermediate result data word and said error correcting data word to remove any change of value produced by a corrupting result bit and to generate an output data word, said output data word containing a plurality of abutting output data values being those that would be generated if said one or more data processing operations were performed upon said plurality input data values and said corresponding portions of said further data word in isolation from one another.
A further aspect the invention provides a computer program for controlling a data processing apparatus in accordance with the above described techniques. The computer program may be stored in various different ways, such as non-volatile memory or a magnetic or optical medium, or alternatively may be dynamically downloaded via a communications link to a data processing apparatus upon which it is desired to execute that computer program.
The above, and other objects, features and advantages of this invention will be apparent from the following detailed description of illustrative embodiments which is to be read in connection with the accompanying drawings.