1. Field of the Invention
This invention relates to data processing. More particularly, this invention relates to data processing systems in which it is desired to perform parallel data processing upon a plurality of data values within a data word.
2. Description of the Prior Art
As data processing systems have developed, data path widths have generally become greater. This has led to the increased possibility that data values which it is desired to process may be much narrower in bit width than the data paths available through the processing hardware. As an example, if the processing hardware provides for 32-bit data processing operations to be performed, but the data values being processed are only 8-bit data values, then it is disadvantageously inefficient to separately process the 8-bit data values upon the much more capable 32-bit data paths.
A known technique for making better use of the data processing resources available in the above circumstances is xe2x80x9csingle instruction multiple dataxe2x80x9d instructions. These special purpose instructions effectively allow multiple data values to be embedded within a data word passing along the data paths of the system with processing operations being performed in parallel upon the plurality of data values embedded within each data word. The instructions control the hardware in a manner that ensures that the results of the processing of one data value are not allowed to interfere with the results of the processing of another data value, e.g. the carry chain of an adder is interrupted at positions between the data values such that a carry from the processing of one data value does not propagate into a neighbouring data value.
Whilst the provision of single instruction multiple data instructions does allow advantageous parallel processing of data values within a single data word, it suffers from the disadvantage that it occupies bit space within the instruction bit space of the data processing apparatus concerned and requires the provision of extra circuitry. Instruction bit space is a valuable resource within a data processing system architecture and increased circuit requirements increase cost, size, power consumption etc. A further disadvantage of the single instruction multiple data instruction approach is that the divisions between data values within a data word are determined by the hardware of the system which gives reduced flexibility in the way the system may be used, e.g. the hardware may assume that the data values are 16-bit data values with two data values being stored within a 32-bit data word, whereas a particular processing requirement might be to handle 8-bit data values, which make relatively inefficient use of a 16-bit data channel provided for them within the single instruction multiple data arrangement.
A further feature of many data processing systems is that data values to be processed in parallel are packed together within the memory of the data processing system in an abutting manner. Accordingly, if the data values to be processed are 8-bit byte values, then these will typically be stored as adjacent data values within a memory system with a plurality of these 8-bit byte values being read simultaneously as, for example, a 32-bit word from the memory system. In these circumstances, if it is desired to separately process the data values, then they must be unpacked from the data word in which they were all read, separately processed, and then repacked within a result data word prior to being stored back to the memory. The processing overhead of the unpacking and re-packing is disadvantageous.
Furthermore, the need to conduct such packing and re-packing and the inefficiency of separately processing data values frequently arises in circumstances, such as video data processing, which are already demanding considerable processing resources and so can ill afford the extra processing requirements.
Viewed from one aspect the present invention provides a method of processing an input data word containing a plurality of abutting input data values, each input data value containing a plurality of multi-bit portions, said method comprising the steps of:
(i) splitting said multi-bit portions of said input data values between a plurality of intermediate data words, each intermediate data word containing a plurality of said multi-bit portions taken from respective input data values, said multi-bit portions being separated by vacant portions within said intermediate data words;
(ii) performing one or more data processing operations upon said intermediate data words, said data processing operations being such that, within at least one of said intermediate data words, a multi-bit data portion may at least temporarily extend in length into a vacant portion of said intermediate data word and one or more data processing operations performed upon at least one intermediate data word differs from one or more data processing operations performed upon a different intermediate data word; and
(iii) combining said intermediate data words to form an output data word containing a plurality of abutting output data values each formed from a plurality of separately processed multi-bit data portions taken from respective intermediate data words.
The invention allows for multiple data values within a data word to be processed in parallel at the cost of having to first split those data values into at least two multi-bit portions with vacant portions between them within an intermediate data word and then having to perform the data processing operations on the two or more intermediate data words. The invention recognises that the disadvantage of having to perform the data processing operations upon more than one intermediate data word is more than outweighed by the ability to process portions of a plurality of data words within each intermediate data word. Moreover, this parallel processing of data values is achieved without the need to provide special purpose single instruction multiple data instructions. This saves instruction bit space that may then be used for other purposes and furthermore has the advantage that there is no hardware constraint upon the way in which a data word is divided to provide for multiple data values.
It will be appreciated that the data processing operations and combination step could take a wide variety of forms. In some circumstances a controlled interaction between the results of processing adjacent data values may be tolerable or desired. However, in preferred embodiments of the invention the steps of performing one or more data processing operations and combining together yield output data values equal to those that would be obtained by performing a desired data processing operation upon said input data values within said input data word in isolation from any abutting input data values.
The data processing operations being performed upon the intermediate data words could take many different forms. However, the invention is particularly well suited to situations in which the desired data processing operations to be performed upon the data values include one or more of addition and shifting. These data processing operations are ones in which abutting data values within a data word would normally undesirably interfere with one another and so the provision of vacant portions within the intermediate data words enables these interactions to be overcome.
The invention is particularly well suited to the processing of data values representing a stream of signal values, such as pixel values. It is common for such signal data values to be of a lower bit width than the capabilities of the data path of the system processing them coupled with the common circumstance that such signal data values often require processing in considerable volumes such that improvements in the efficiency in which these signal values may be handled are highly advantageous.
A preferred feature of the invention is that certain data processing operations need only be performed upon one of the intermediate data values. A common example of this is the need to perform a rounding operation. Preferred embodiments of the invention allow the rounding of multiple data values to be performed in parallel by an operation conducted upon only one of the intermediate data words.
It will be understood that the input data values could be split between two or more intermediate data words. However, preferred embodiments of the invention split the input data values between two intermediate data words. Two intermediate data words allow for the provision of vacant portions between multi-bit portions of the input data values whilst keeping the increase in the number of data processing operations that need to be performed upon the separate intermediate data words to a low level.
The techniques of the invention have been found particularly well suited to circumstances in which it is desired to average a plurality of adjacent data values, such as by utilising a shifted add on top operation upon the intermediate data words.
Whilst the bit widths of the data path and the data values may take a variety of values, the gains in efficiency have been found to be particularly high in the case where the input data word is a 32-bit input data word, the input data values are 8-bit input data values and the split is performed into high-order 4-bit multi-bit portions and low-order 4-bit multi-bit portions.
Viewed from another aspect the invention provides apparatus for processing an input data word containing a plurality of abutting input data values, each input data value containing a plurality of multi-bit portions, said apparatus comprising:
(i) splitting logic operable to split said multi-bit portions of said input data values between a plurality of intermediate data words, each intermediate data word containing a plurality of said multi-bit portions taken from respective input data values, said multi-bit portions being separated by vacant portions within said intermediate data words;
(ii) processing logic operable to perform one or more data processing operations upon said intermediate data words, said data processing operations being such that, within at least one of said intermediate data words, a multi-bit data portion may at least temporarily extend in length into a vacant portion of said intermediate data word and one or more data processing operations performed upon at least one intermediate data word differs from one or more data processing operations performed upon a different intermediate data word; and
(iii) combining logic operable to combine said intermediate data words to form an output data word containing a plurality of abutting output data values each formed from a plurality of separately processed multi-bit data portions taken from respective intermediate data words.
The invention also provides a computer program for controlling computer hardware in accordance with the above techniques.
The above, and other objects, features and advantages of this invention will be apparent from the following detailed description of illustrative embodiments which is to be read in connection with the accompanying drawings.