This application claims priority to GB Patent Application No. 1317876.9 filed 9 Oct. 2013, the entire content of which is hereby incorporated by reference.
The present invention relates to a data processing apparatus and method for controlling performance of speculative vector operations.
One known technique for improving performance of a data processing apparatus is to provide circuitry to support execution of vector operations. Vector operations are performed on at least one vector operand, where each vector operand comprises a plurality of vector elements. Performance of the vector operation then involves applying an operation repetitively across the various vector elements within the vector operand(s).
In typical data processing systems that support performance of vector operations, a vector register bank will be provided for storing the vector operands. Hence, by way of example, each vector register within a vector register bank may store a vector operand comprising a plurality of vector elements.
In high performance implementations, it is also known to provide vector processing circuitry (often referred to as SIMD (Single Instruction Multiple Data) processing circuitry) which can perform the required operation in parallel on the various vector elements within the vector operands. In an alternative embodiment, scalar processing circuitry can still be used to implement the vector operation, but in this instance the vector operation is implemented by iterative execution of an operation through the scalar processing circuitry, with each iteration operating on different vector elements of the vector operands.
Through the use of vector operations, significant performance benefits can be realised when compared with the performance of an equivalent series of scalar operations.
When seeking to gain the performance benefits of vector processing, it is known to seek to vectorise a series of scalar operations in order to replace them with an equivalent series of vector operations. For example, for a loop containing a series of scalar instructions, it may be possible to vectorise that loop by replacing the series of scalar instructions with an equivalent series of vector instructions, with the vector operands containing, as vector elements, elements relating to different iterations of the original scalar loop.
However, whilst such an approach can work well when the number of iterations required through the original scalar loop is predetermined, it is more difficult to vectorise such loops when the number of iterations is not predetermined. In particular, since the number of iterations is not predetermined, it cannot be predetermined how many vector elements will be required in each vector operand.
In some situations of the above type, it is possible to perform speculative vector processing, where a speculation is made as to the required number of vector elements, and remedial action is taken later when the exact number of vector elements required is determined.
The Ph.D. thesis entitled “Vector Microprocessors” by K Asanovic, Berkeley, 1998, pp. 116-121, discusses performing speculation across the entire width of the vector operands, and additionally keeping track of architectural events (for example page faults) that occur during speculation. Such architectural events will trigger an exception, causing an exception routine to be executed by the operating system in order to resolve the exception. The proposed approach keeps a record of each vector element position within the vector width where such an architectural event was detected. Later, when a commit point is reached where the required set of vector element positions is known, each required vector element position is compared against this record of architectural events. Since any architectural events associated with required vector element positions will prevent the vector processing circuitry from correctly performing vector operations, any such deferred exceptions are triggered at the commit point. If none of the required set of vector element positions is associated with an architectural event, then the vector length and mask are updated and the record of architectural events is cleared.
The above process allows speculative vector processing to be performed, whilst ensuring correct operation by masking out architectural events at the commit point.
However, whilst the above approach can ensure correct operation whilst performing speculative vector processing operations, there are other factors that can affect the benefits of performing speculative vector processing. As mentioned earlier, at the time the speculation is performed, the number of iterations required is not known, and hence there is the possibility of performing certain operations that may adversely impact a performance characteristic of the apparatus (for example throughput or energy consumption), only later to find out that those operations were not required. Accordingly, it would be desirable to provide a mechanism for performing speculative vector operations whilst managing the impact of such speculative vector processing on a performance characteristic of the apparatus.