The present invention relates to a data processing apparatus and method for performing vector processing.
Vector processing is a known technique for seeking to improve the performance of a data processing apparatus. In accordance with vector processing, a vector processing unit can be provided that has a plurality of lanes of parallel processing. Within each lane, the vector processing unit can be arranged to perform a processing operation on a data element input to that lane for each of one or more input operands for the processing operation. Accordingly, when an instruction is executed by the vector processing unit, the processing operation defined by that instruction is performed in parallel within each of the lanes of parallel processing, each lane receiving different data elements from each input operand specified by the instruction. Such an approach is often referred to as a Single Instruction Multiple Data (SIMD) approach.
By such an approach, a SIMD instruction can be executed once within the vector processing unit, rather than having to execute an equivalent scalar instruction multiple times within a scalar processing unit, hence enabling significant performance benefits to be realised.
However, the provision of such vector processing units is typically relatively expensive when compared with equivalent scalar processing circuitry. The inventor of the present invention realised that in many practical implementations, there are periods of time where the vector processing unit is quite underutilised, and in particular the vector processing unit may often be used to execute vector instructions that do not require use of all of the lanes of parallel processing provided by the vector processing unit.
One of the reasons why underutilised lane usage can arise is due to limitations in compilers. For instance, considering the example of a graphics processing unit, shader compilers often need to do runtime compilation before the shader is executed, which can lead to constraints on the amount of resources and time available for compilation. Further, often software developers do not make use of vector features in modern APIs, be it because of legacy software, time constraints on development, etc.
It would hence be desirable to provide a mechanism for making more efficient use of the vector processing resources within a data processing apparatus.
This application claims priority to GB 1320854.1 filed 26 Nov. 2013, the entire contents of which is hereby incorporated by reference.