1. Field of the Invention
The present invention relates to a superscalar data processing apparatus and method.
2. Description of the Prior Art
A superscalar architecture implements a form of parallelism within a central processing unit (CPU), thereby allowing the system as a whole to run much faster than it would otherwise be able to at a given clock speed. To achieve this, a superscalar data processing apparatus has a plurality of execution pipelines which can operate in parallel, thereby allowing more than one instruction to be executed in parallel. In a typical superscalar CPU, there are often several functional units of the same type provided within the various pipelines, along with circuitry for dispatching operations specified by the instructions to the various functional units. For instance, it is common for superscalar designs to include more than one arithmetic logic unit (ALU), which allows the dispatcher circuitry to dispatch certain data processing operations for execution in parallel, each operation being dispatched to one of the ALUs.
Often, a superscalar processor is arranged as an in-order machine, where a sequence of instructions are executed in original program order. Such in-order superscalar machines typically use asymmetric execution pipelines, such that the multiple execution pipelines have different capabilities. For example, one execution pipeline may be able to handle a larger number of different types of operations, whereas another execution pipeline may only be able to handle a smaller subset of operations. Typically, only one of the execution pipelines has the ability to handle load or store operations specified by load or store instructions, also referred to herein as memory access instructions. An example of an in-order superscalar processor where only one of the execution pipelines can handle memory access operations is the ARM Cortex A8 processor.
One reason for only allowing one pipeline to handle memory access operations is that it avoids the need to provide hazard detection hardware to detect hazards which could otherwise occur when issuing multiple memory access operations in parallel. A hazard occurs when two operations occur to the same address in quick succession. The hazard types enumerated below refer to the cases where consistency problems arise if the two accesses are processed in the wrong order. A Write after Write (WaW) hazard occurs when there are two writes close together—a subsequent read must return the value written by the second write to occur in program order. A Read after Write (RaW) hazard means that the read must return the data that was written in a preceding operation whilst a Write after Read (WaR) hazard means that the read must return the data before the write happened. If two operations can be issued at the same time special care is needed to make sure the correct values end up in the read register (RaW, WaR) or memory (WaW) if a hazard occurs, and this typically requires specific hardware to be added to deal with such situations.
Unfortunately, the addresses of memory accesses are typically not resolved until after operations are issued. This may for example be due to the fact that the addresses rely on register values that are computed in the immediately preceding operations (and sent via forwarding paths), and/or a calculation may take place to compute the address later in the pipeline. Since the addresses of memory accesses are typically not resolved until after operations are issued, it is often impossible to tell at issue time whether a hazard exists or not (a pessimistic approach of not issuing any pair of memory operations in parallel which could cause a hazard would lead to dramatic under utilization since any pair of two writes or one read and one write can cause a hazard). Hence if more than one of the execution pipelines were to be allowed to handle memory access operations, complex hardware would need to be provided within the superscalar processor to detect such hazards later in the pipeline and deal correctly with the hazards when they occur.
Another type of data processing architecture is the multi-threading architecture, in which a processor has access to a plurality of different sets of registers, each register set being associated with a different execution thread of the processor. Such a multi-threading architecture at the hardware level (i.e. through the provision of multiple register sets) provides significant performance improvements for a data processing apparatus, since by increasing the amount of processing work available at any point in time, this tends to increase the amount of useful work that can be performed by the data processing apparatus at that time. For example, when a particular execution thread reaches a point where a significant delay is incurred, for example due to the requirement to perform a fetch of a data value from memory, then the processor can switch to another execution thread at that time with minimal delay, due to the fact that the execution context of that other execution thread is already available in the associated register set provided for that other execution thread.
It is envisaged that it would be beneficial to seek to incorporate the above multi-threading principles within a superscalar data processing apparatus such as described earlier. However, if such multi-threading capabilities are added to such a superscalar machine, the asymmetric pipelines are likely to adversely impact the overall performance improvements that can be gained, since there will typically be much contention between the various execution threads for use of the memory access capable pipeline, whilst the other pipeline or pipelines are relatively idle. However, if as an alternative symmetric pipelines were used, where each pipeline could handle memory access operations, and operations were freely dispatched to these various execution pipelines in order to seek to maximise throughput, then the hardware required to detect the earlier-discussed hazards that could occur would be very complex, increasing the overall cost and complexity of the superscalar data processing apparatus.