This invention relates to computers and computer systems, and particularly to the handling of multi-operand instructions during a cache miss within a processor.
In a typical microprocessor, there exists at least a load-store unit (LSU) that maintains a level 1 transition lookaside buffer (TLB) and a level 1 cache, and an execution unit (FXU) that executes general fixed-point instructions. Many times, in program execution, it is necessary to access the cache and send the desired operand data directly (sometimes called a bypass) from the LSU to the FXU without effecting performance. In some instances, this may be done as part of a register storage (RX) instruction execution, as two related hardware micro-op instructions (a load instruction and an execution instruction cracked from an “RX” instruction), or as a fixed point instruction that is dependent on a previous load instruction.
Many times, for cycle-time or pipeline design reason, the FXU will already be performing execution on given cache data before it can react to a cache or TLB miss indication from LSU. In this case, a usual data pipeline will nullify current and future execution and recycle back to the point in the execution at which the cache miss occurred, awaiting an indication from the LSU that data is again ready for processing. Such a delay and restart shall be referred to herein as a “pipeline recycle.” Of course, other conditions may create the need for a pipeline recycle, LSU cache misses are just among the more common.
In a complex instruction set computer (CISC) architecture there may be instructions defined that require multiple operands and possibly output multiple results. These instructions either operate on one long operand and generate a potentially long result, or operate on 2 long operands and generate a potentially long result. “Long,” as the term is used herein with respect to operands and results, means that the width of the operand is larger than the width of the execution space (i.e., the processing width) of the processing unit performing the execution. Instances having long operands or long results require multiple cycles of operand accesses or result output. These multiple cycles increase the number execution steps.
In a typical processor having a 64 bit wide dataflow, a storage-to-storage (SS) type instruction has a length that is greater than 8 bytes. In a processor design where these long instructions are not emulated by code (such as millicode), or not cracked into micro-operations, a “pipeline recycle” in the middle of the execution may require special handling, since the “recycled” operation will have to be redone, and operations after that point might have already started and also have to be nullified due to dependencies. One way to handle this is to crack long instructions into micro-operations. Another is to emulate the instruction through internal code. However, these solutions may decrease performance, or may not be possible for all long instructions.
Other processor designs can avoid pipeline recycle by either accessing all operands ahead of time and storing them in a buffer before execution, or the processor pipeline design can allow the FXU control and dataflow to “freeze” when data is not available due to, for example, a cache miss. However, these solutions may require additional memory slow down the frequency of operation, or add additional pipeline cycles to allow the LSU data not available signal to freeze all FXU controls.
As discussed, during the execution of any multicycle instruction that must access operands multiple times, it is possible to experience a cache miss at any access, or other pipeline error conditions. In a processor pipeline that execute long instructions fully in hardware, and uses the “pipeline recycle” mechanism, this forces the processor into a recycle window during which execution of the instruction is paused, and possibly backed up to the operations at the miss. After this window completes, the instruction re-executes from the cycle that has experienced the miss and then continues its processing.
In the case of instructions where a given cycle of execution is dependent on the results of a previous cycle of execution and that given cycle's operand(s), two problems occur. The first problem is that the result of the current operation will need the result of the previous successful operation which was only a temporary result. Even if the LSU can resend previous operand data, the execution unit may not be able to re-execute a previous operation since the operation itself may be dependent on a result from even earlier execution. Similarly, it is possible that the operand data to be delivered from the LSU is dependent on two different accesses. In some instances, one access may have been fine but the other may have been a miss.
It would be desirable/advantageous to be able to allow both the LSU and the FXU to capture acquired data and results before the recycle window begins, and reuse that same data after the recycle has completed.