One known technique for improving performance of a data processing apparatus is to provide circuitry to support execution of vector operations. Vector operations are performed on at least one vector operand, where each vector operand comprises a plurality of operand elements. Performance of the vector operation then involves applying an operation repetitively across the various operand elements within the vector operand(s).
In typical data processing systems that support performance of vector operations, a vector register bank will be provided for storing the vector operands. Hence, by way of example, each vector register within a vector register bank may store a vector operand comprising a plurality of operand elements.
In high performance implementations, it is also known to provide vector processing circuitry (often referred to as SIMD (Single Instruction Multiple Data) processing circuitry) which can perform the required operation in parallel on the various operand elements within the vector operands. In an alternative embodiment, scalar processing circuitry can still be used to implement the vector operation, but in this instance the vector operation is implemented by iterative execution of an operation through the scalar processing circuitry, with each iteration operating on different operand elements of the vector operands.
The various operations performed by processing circuitry of a data processing apparatus are typically controlled by a sequence of instructions. Each instruction will be decoded, and result in control signals being issued to the relevant processing circuit blocks to cause the operation specified by that instruction to be performed.
For traditional data processing systems configured to implement scalar operations on scalar operands, scalar instructions will be specified defining the various scalar operations required. Accordingly, a particular data processing apparatus will. typically execute scalar instructions from a scalar instruction set in order to allow a variety of scalar operations to be performed within the scalar processing circuitry of the apparatus. To support execution of vector operations, it is typically the case that separate vector instructions will be defined to identify the operations required in respect of specified vector operands. Accordingly, this has led to the development of a separate vector instruction set, and typically modern data processing systems that support vector operations are able to execute vector instructions from a specified vector instruction set, whilst also supporting execution of scalar instructions from a corresponding scalar instruction set.
When developing vector instruction sets, it is typically the case that, for most scalar instructions in a scalar instruction set, it is desirable to provide several corresponding vector instructions, for example to support different vector data flow patterns. For example, considering a particular scalar add instruction, it may be necessary to provide several vector add instructions in order to support variants which differ in how data flows between adjacent elements of the vector operands. Examples of vector instruction sets that seek to add at least one vector version of each instruction in the scalar instruction set are Intel's MMX/SSE/AVX instruction sets, IBM's Altivec instruction set and ARM's NEON instruction set.
It is also common for vector instructions to specify at least one control register, for example to identify which elements of the vector operand are active and should be processed by the vector operation, the specification of such control registers requiring availability of bits within the vector instruction encoding.
Many systems operate with fixed size instruction sets, and accordingly there is a significant constraint on the bits available for encoding all of the various different instructions. This constraint is particularly acute when seeking to define all of the variants of vector instruction that would be desirable within a vector instruction set, and the problem is further compounded by the need to identify one or more control registers within those vector instructions.
One known technique for alleviating the above-mentioned encoding space problem is to provide data processing systems which support variable length instructions. An example of a variable-length instruction set is the instruction set provided by the Intel x86 architecture. In accordance with such techniques, the size of the instructions is not fixed, and accordingly it is possible for complex instructions to include more bits in order to allow all of the required information to be encoded within the instruction. Whilst this does alleviate the encoding space problem, it leads to significant additional complexity within the data processing system, for example to enable the start and end of each instruction to be identified, and to enable instructions to be correctly decoded. In many implementations, the complexity associated with the support of variable length instruction sets makes the use of such variable length instruction sets impractical.
Another approach adopted in some highly parallel processor designs seeks to avoid the cost of supporting both a scalar instruction set and a vector instruction set by only providing a vector instruction set. All scalar operations are then implemented by performing a vector operation, with all operand elements except the first operand element being ignored. However, this does increase complexity in the handling of scalar operations.
Accordingly, it would be desirable to provide an improved technique for supporting the execution of vector operations within a data processing apparatus that also supports scalar operations, when using fixed length instructions.