The present invention relates to a data processing apparatus and method for controlling use of an issue queue used to temporarily store decoded instructions prior to their execution within execution circuitry of the data processing apparatus.
The execution circuitry of a data processing apparatus is typically arranged to perform a sequence of operations on data in response to a series of instructions fetched from memory. Typically, the fetched instructions will be passed to decoder circuitry which will generate, for each instruction, a corresponding control data block (also referred to herein as a micro-operation or micro-op) identifying an operation to be performed by the execution circuitry in order to execute that instruction.
Often, the control data blocks generated as a result of decoding the fetched instructions are temporarily buffered within an issue queue, prior to those control data blocks being issued to the execution circuitry in order to cause the corresponding operations to be performed. By temporarily buffering such decoded instructions within an issue queue, this enables a steady of stream of instructions to be available for issue to the execution circuitry. Further it provides a mechanism to support out of order execution, where issue control circuitry can decide in which order to issue to the execution circuitry the various control data blocks in the issue queue, with the aim of increasing throughput.
An instruction not only encodes the operation to be performed in response to that instruction, but typically identifies operand data to be processed during execution of the operation. Often, this operand data is identified with reference to one or more registers within a register file of the data processing apparatus, those registers storing the required operand data.
Traditionally, an issue queue provided a plurality of slots, each slot being used to store a control data block output by the decoder circuitry. However, the operand data was not stored within the issue queue, and instead was separately provided from the register file to the execution circuitry.
However, more recently, issue queues have been developed where each slot has been extended in size to allow operand data to be held in that slot with the associated control data block. By such an approach, this can alleviate a bandwidth constraint that may otherwise arise in respect of access to the register file. In particular, a number of register read ports will be provided for reading operand data from the registers of the register file. For example, two register read ports may be provided to allow the contents of two 64-bit registers to be read simultaneously from the register file. The number of read ports will clearly limit the amount of data that can be read simultaneously from the register file, but it is very expensive to increase the number of read ports. By allowing the issue queue to hold the operand data for control data blocks held within the slots of the issue queue, this increases flexibility as to when and how that operand data can be made available. For example, that operand data can still be read from the register file, but alternatively may be provided to the issue queue via another mechanism, such as via a forwarding path used to provide the issue queue with a copy of result data output by the execution circuitry to the register file. When the operand data is provided via the forwarding path, this removes the requirement to read that particular operand data from the register file, hence alleviating the above-mentioned bandwidth constraint.
The execution circuitry will typically include a number of execution units. Whilst most of those execution units might typically be arranged to process a standard width of operand data when performing the operation specified by each control data block issued to them (for example 128 bits of operand data in the particular example where the instructions encoding those operations specify as operand data two 64-bit registers), at least one of the execution units may be configured to process a wider width of operand data during performance of the operation specified by each control data block. For example, a Single Instruction Multiple Data (SIMD) execution unit may be configured to receive two items of operand data, but where each item of operand data is larger than the standard 64 bits (for example the SIMD execution unit may be configured to process two 128-bit operands). As another example, a multiply-accumulate (MAC) unit may be configured to process more than two items of operand data, with each of those items having the standard data width (for example, a particular MAC unit may be configured to process three 64-bit operands). For the purposes of the present application, such execution units that are configured to process a wider width of operand data will be referred to as wide operand execution units.
Whilst the use of such wide operand execution units can significantly improve performance of the data processing apparatus, their presence gives rise to a significant cost issue in respect of the issue queue, in situations where the issue queue is configured to store the required operand data for each control data block within the associated slot of the issue queue. In particular, the issue queue is a large module within the data processing apparatus, and the size has to be increased significantly if each slot is to have sufficient space to store all of the operand data specified by instructions to be executed within such wide operand execution units. Further, it is often the case that relatively few of the instructions in the instruction set will specify such wider operand data. As a result, there are many applications where the area and power consumption costs associated with increasing the size of the issue queue to accommodate the wider operand data are considered prohibitive.
Various prior art techniques have been described whereby certain instructions are divided into multiple smaller instructions for executing separately, see for example U.S. Pat. No. 6,233,671, EP 0,947,917, US 2009/327665, WO-A-9806042, U.S. Pat. No. 7,096,343, US 2005/228969, US 2005/198473, U.S. Pat. No. 6,834,337, U.S. Pat. No. 6,367,067.
It would be desirable to provide a mechanism that allowed the performance benefits of using wide operand execution units to be retained, but whilst alleviating the associated size requirements of the issue queue.