1. Field of the Invention
The present invention relates to the field of data processing and, in particular, to the field of SIMD data processing in which data processing instructions perform a data processing operation in a number of parallel lanes of processing on respective data elements from within a source register so as to generate respective data elements within a destination register.
2. Description of the Prior Art
It is known to provide SI MD (single instruction multiple data) processors in which a data processing operation upon a specified register results in parallel operations being performed upon multiple data elements stored within that register, each of those elements being treated as part of a lane of processing. The processing lanes are isolated from one another to the degree necessary to ensure that the processing within one lane does not inappropriately influence the processing in any of the other lanes. This may have significant advantages, particularly in fields where a large amount of data needs to be processed in the same way, such as video data where the same operations need to be performed on a large number of pixels.
Some functions convert very easily to SIMI processing whilst others are not easily adapted to these parallel processing operations. For example, a considerable amount of time and processing may be required with some operations to set up the data elements at the appropriate positions within the SIMD register and to rearrange those positions during the processing operation to ensure that a single instruction can operate correctly on the multiple data within the lanes. As well as consuming time and power, encoding such activity to rearrange data elements also reduces code density and can consume register resources which could otherwise be more usefully employed.
When performing arithmetic operations it may be that the resultant data required is of a smaller data size than the source data. For example, the multiplication of data values may give an answer of a data type that is twice the size of the original data type. Further arithmetic operations may need to be performed on this product data but the final result that is required is of the same size data type as the original data. Conventionally this has been dealt with in SIM processing by using instructions such as multiply take high half. With such processing the data is narrowed following the multiplication and before any further arithmetic operations are performed upon it. This has the advantage that the size of the data elements does not vary throughout the calculation, but it is has the disadvantage that any further arithmetic operations are performed on data elements of a reduced size. This may impinge on the accuracy of the result.