1. Field of the Invention
The present invention relates to the field of data processing and in particular to vector instructions for accessing a plurality of data storage positions.
2. Description of the Prior Art
One known technique for improving performance of a data processing apparatus is to provide circuitry to support execution of vector operations. Vector operations are performed on at least one vector operand, where each vector operand comprises a plurality of operand elements. Performing the vector operation involves applying the operation repetitively across the various operand elements within the vector operand(s).
In typical data processing systems that support performance of vector operations, a vector register bank will be provided for storing the vector operands. Hence, by way of example, each vector register within a vector register bank may store a vector operand comprising a plurality of operand elements.
In high performance implementations, it is also known to provide vector processing circuitry (often referred to as SIMD (Single Instruction Multiple Data) processing circuitry) which can perform the required vector operation in parallel on the various operand elements within the vector operands. In an alternative embodiment, scalar processing circuitry can still be used to implement the vector operation, but in this instance the vector operation is implemented by iterative execution of an operation through the scalar processing circuitry, with each iteration operating on different operand elements of the vector operands. It should be noted that there are intermediate implementations where a few vector elements may be processed together.
Vector data access instructions each instruct a plurality of data accesses. Generally a processing apparatus will not be able to perform all the data accesses specified by a vector access instruction in parallel with each other in a single cycle, the access will generally take several cycles. If a plurality of vector data access instructions are being executed, the access speeds may be increased if the data accesses from different vector data access instructions can be interleaved with each other. This is due to the opportunities to merge operations to related addresses that such interleaving introduces and to the possibility to find additional parallelism.
In some cases the accesses performed are completely independent of each other and interleaving between them can be allowed thereby increasing the speed of the accesses. In other cases, they may not be independent of each other and they may therefore be constrained to execute in instruction stream order.
FIG. 1 shows an example of a vector access instruction for accessing addresses a0 to a7, followed by a vector access instruction for accessing addresses b0 to b7 according to the prior art. In the case that these instructions are processed in a system where no interleaving is allowed and where two data access requests can be issued in one clock cycle, then data access request b7 will be issued seven clock cycles after the instruction is received.
It would be desirable to provide an improved technique for supporting the execution of vector operations within a data processing apparatus that also supports scalar operations.