1. Field of the Invention
The present invention relates to the locking of source registers in a data processing apparatus.
2. Description of the Prior Art
It is known to provide processors that incorporate one or more pipelines for executing instructions. Due to the pipeline nature of such processors, multiple instructions may be in the process of being executed at any point in time, and this has given rise to the need to provide appropriate hazard and resource checking functions for the pipelined processor. Hence, each instruction is typically evaluated prior to issuing it to the execution pipeline to determine whether a hazard condition or a resource conflict would arise if it were to be issued to the execution pipeline. A hazard condition will be detected if that instruction requires a data item that is not yet available due to it still being computed (for example by an instruction already being executed in the pipeline), or if that instruction requires access to a register which is still required by another instruction already in the pipeline, and which must not be overwritten until it is read by that instruction already in the pipeline. A resource conflict will be detected if there is a requirement for a processing unit, which is not available due to it already being used for another executing instruction.
It is possible to perform no hazard or resource checking, and instead to leave the job of avoiding hazard conditions and resource conflicts to the compiler/code writer. However, this typically results in very complex code, and accordingly it is common place for such hazard and resource checking procedures to be implemented.
The hazard and resource checking functions require the ability to stall the relevant processor and all dependent processors. For example, a coprocessor which detects a hazard condition must signal the detection of that hazard condition to the main processor, and the main processor will in most cases stall in order to maintain instruction flow coordination with the coprocessor. Similarly, a main processor detecting a hazard condition or a resource conflict may need to advise all coprocessors to stall their pipelines accordingly. It will be appreciated that stalling introduces uncertainty into the determination of the time to run a section of code.
Known processors utilise a variety of complex methods to detect hazard conditions and resource conflicts, and to reduce the impact on performance of such hazard conditions and resource conflicts. Register renaming is one such technique which may be used, this technique involving the utilisation of additional registers to remove hazard conditions relating to the writing of a register involved as a source register for an instruction already being executed. In high performance processors, instructions may be issued out of program order, enabling instructions which have no hazard or resource conflicts to execute ahead of instructions with hazard or resource conflicts. A typical technique used with such an approach is to maintain tables of instructions currently in some state of execution, and then, for a particular instruction, to make a determination as to the availability of the functional unit to process the instruction, and the availability of the operands required by the instruction. This may be accomplished in a distributed method by using, for instance, reservation stations, or in a centralised manner using, for instance, a reorder buffer technique.
Both of the above techniques are well known in the industry. As will be appreciated by those skilled in the art, the cost of such techniques, in terms of area and power, and in complexity, is relatively high.
Another known techniques which avoids much of the cost and complexity of the above described techniques involves the use of a scoreboard. A scoreboard tracks the availability of registers, either as source operands or as destinations for store operations. The scoreboard may be separated into separate parts, one for source operands, and one for destination operands, or a single scoreboard may be maintained for both source and destination operands. An entry in the scoreboard is then cleared when the register associated with that entry is available for use by a subsequent instruction. Hence, instructions to be issued to the execution pipeline which will require registers which are shown as locked in the scoreboard are forced to wait, or stall, until the registers become available. Scoreboards are typically simpler and cheaper, in terms of area, power and development costs, than the earlier described techniques, but typically offer lower performance.
Accordingly, the application area for the data processing apparatus typically dictates which of the above approaches are used. For high performance applications, such as desktop computing or scientific or business computing, the more complex techniques are often required to deliver the necessary performance. Machines such as the CDC6600, IBM360/91, and recent IBM, Intel x86 and Sun SPARC processors utilise these more complex techniques. However, for embedded applications the performance is typically not as critical, but instead the chip area of the processor and the power consumed are of greater importance, with the performance merely needing to be sufficient to meet the goal of the application. In such cases, the use of in-program-order instruction issue and the above described scoreboarding technique is typically the most appropriate technique for checking hazard conditions, whilst employing a limited number of pipelines, typically one or two, assists in reducing the complexity of resource management.
In many applications, the arithmetic performed may be characterised in such a manner that the range of operands and results is well known. However, when this is not possible, the arithmetic must be able to process, in a consistent and reasonable manner, conditions in which the result of an operation is outside the bounds of the range of the data type supported, or the operation involves operands for which a result is not defined (for instance, an addition of a positive infinity to a negative infinity).
Considering the example of floating-point arithmetic, the “IEEE Standard for Binary Floating-Point Arithmetic”, ANSI/IEEE Std 754-1985, The Institute of Electrical and Electronic Engineers, Inc., New York, 10017 (hereafter referred to as the IEEE 754 standard) specifies the behaviour of instructions when the results are outside the range of the data type supported, or when the result of the operation is not defined. In order to fully implement the IEEE 754 specification, covering all the possible cases which the arithmetic may require, results in additional hardware/area and power consumption, and/or reduced clock speed. The IEEE 754 specification defines a number or exceptions, such exceptions being cases in which the result is not what would be returned if the arithmetic were performed with unlimited precision, or if the result is not defined. Five types of exception are defined in the IEEE 754 specification, namely invalid, overflow, underflow, divide-by-zero, and inexact.
In most embedded applications, the arithmetic may be characterised as well known, and the full features of the IEEE 754 specification are not typically required. For example, features such as subnormal support, and support for NaN (Not-a-Number) processing may not be required. However, the ability to process overflow conditions, and operations involving infinities, is generally advantageous.
When such exceptions are detected during execution of an instruction within the pipelined processor, then exception handling mechanisms are often invoked to handle those exceptions. Such exception handling mechanisms, when invoked, may need access to the source operands for the exceptional instruction in order to deal with the exception. Accordingly, when locking registers to avoid hazard conditions, such as is done when using the earlier described scoreboard technique, it is in such cases necessary to lock those source registers until the instruction has passed the point in the execution pipeline at which the exception will be detected.
In one prior art processor, described in more detail in GB-A-2,339,312, the processor is arranged to detect and process some of the exceptional cases pessimistically, meaning that a determination of the presence of an exception condition is based on the information available before the instruction is processed completely. In order to ensure that all possible exception cases are processed, some cases which may not result in an exceptional condition are treated as such until the operation is processed completely and a final determination made. In the processor described in GB-A-2,339,312, the detection is done in the first execute stage of the processor pipeline in order to minimise the amount of information required to save the current state of the processor to a minimum. Since detection of the exception is done without completion of the instruction, the source operands are required to be preserved for the exception handling mechanism, and cannot be unlocked in the scoreboard until the associated instruction passes the exception detection point (here the execute 1 stage of the pipeline). A software routine is then utilised to determine the exact disposition of the instruction, generate the correct answer, including special handling for IEEE 754 exception cases, and either return to the program or execute a user-defined exception handler.
Hence, in summary, the software routine used for exception handling requires access to the source registers of the instruction. Further, it will not execute until some number of cycles after the exceptional instruction. Accordingly, an instruction which may need to access the exceptional instruction's source registers may be issued between the execution of the exceptional instruction and the execution of the software routine to deal with the identified exception condition.
Thus, any instruction which needs to access the source registers of an executing instruction, in order to read them as operands, store them to memory, or load them with a new value, typically has to wait until the executed instruction passes the exception determination point, and the source registers are then unlocked in the scoreboard (either because no exception is detected, or after the relevant software routine has performed the necessary exception processing).
Hence, it can be seen that this requirement to lock registers whilst a determination as to the presence of an exception condition in the corresponding instruction is made can significantly impact on the efficiency of the pipeline processing circuit by causing subsequent instructions to stall if they need accesses to such locked registers.