Vector processing is a processing technique which is becoming more and more commonplace in data processing systems. Vector processing units have been developed for performing such vector processing operations, and allow operations to be performed on multiple data elements simultaneously. The vector processing unit provides a plurality of lanes of parallel processing such that when data elements are input to those lanes of parallel processing, a data processing operation can be performed in parallel within those lanes of parallel processing. This enables significant performance benefits to be realised when compared with scalar processing techniques which would require the data processing operation to be performed multiple times sequentially, typically using different input data elements for each iteration.
US 2007/0250681 describes such a vector processing unit, in the system described therein the vector processing unit taking the form of a coprocessor performing vector processing operations as required by a host processor. As an alternative approach to providing the vector processing unit as a coprocessor, it is possible to extend the base architecture of a standard processor by replicating part of its core processing elements and adding special instructions which allow multiple data elements to be processed in these units simultaneously.
There are many types of data processing operations which can benefit from the use of vector processing. For example, computer graphics hardware and video game consoles rely heavily on vector processors in their architecture in order to provide high performance graphics processing functions.
The data elements to be processed by vector processing units are typically arranged into vectors of data elements, where each vector comprises a plurality of data elements, and where each data element in the vector forms an input data element for a different lane of parallel processing.
Whilst the above discussed vector processing approach can yield very significant performance benefits in situations where all of the data elements in the input vectors need to be subjected to a particular data processing operation, situations arise where it would be desirable to make performance of a particular operation conditional within the various lanes of parallel processing. In accordance with such an approach, vectors of input data elements would still be provided to the vector processing unit, but the operation specified by a particular vector instruction would not necessarily be performed within all of the lanes of parallel processing.
However, it is a complex issue to seek to encode such conditionality within the vector instruction that is defining the vector operation to be performed on the input data elements, and is likely to result in a significant increase in the size of such an instruction. Accordingly, it is often the case that where such conditionality is required, the vector processing unit is not used, and instead a sequence of scalar operations are performed within a scalar processing unit to perform the required operation on that subset of data elements for which performance of the operation is required.
However, this significantly impacts performance, and it would be desirable to allow the performance benefits of using the vector processing unit to be realised even in situations where conditional execution within each of the lanes of parallel processing is required.