1. Field of the Invention
The present invention relates generally to the field of processors, and more specifically to a replay architecture having fast and slow replay paths for facilitating data-speculating operations.
2. Background Information
FIG. 1 shows a block diagram of one embodiment of a processor 100 disclosed in U.S. Pat. No. 5,966,544. The processor 100 shown in FIG. 1 includes an I/O ring 111 which operates at a first clock frequency (I/O clock), a latency-tolerant execution core 121 which operates at a second clock frequency (e.g., slow clock), a latency-intolerant execution sub-core 131 which operates at a third clock frequency (e.g., medium clock), and a latency-critical execution sub-core 141 which operates at a fourth clock frequency (e.g., fast clock). The processor 100 shown in FIG. 1 also includes clock multiplication and/or division units 110, 120, and 130 which are configured to provide appropriate clocking to the various portions or sub-cores of the processor 100, as taught in the prior application. The specific portion of the prior application""s teachings which is most pertinent here is that the execution core may include two or more portions (sub-cores) which operate at different clock rates.
In operation, the I/O ring 111 communicates with the rest of the computer system (not shown) by performing various I/O operations, such as memory reads and writes, at the I/O clock frequency. For example, the processor 100 may perform an I/O operation at the I/O ring 111 at the I/O clock frequency to read in data from an external memory device. The various execution sub-cores 121, 131, and 141 can perform various functions or operations with respect to the input instructions and/or input data at their respective clock frequencies. For example, the latency-tolerant execution sub-core 121 may perform an execution operation on the input data to produce a first result. The latency-intolerant sub-core 131 may perform an execution operation on the first result to produce a second result. Similarly, the latency-critical execution sub-core 141 may perform another execution operation on the second result to produce a third result. The various operations performed by the various execution sub-cores may include arithmetic operations, logic operations, and other operations, etc. It should be appreciated and understood by one skilled in the art that the execution order in which the various operations are performed need not necessarily follow the hierarchical order of the various execution sub-cores. For example, the input data could go immediately and directly to the innermost sub-core and the result obtained therefrom could go from the innermost sub-core to any other sub-core or back to the I/O ring 111 for write-back. In addition, as it is disclosed and taught in the prior application, on-chip cache structures may be split across two or more portions of the processor 100. As such, certain operations and/or functions can be performed at one clock frequency with respect to one aspect of the data stored in the on-chip cache while other operations and/or functions can be performed at a different frequency with respect to another aspect of the data stored in the on-chip cache. For example, a way predictor miss with respect to the on-chip cache may be performed in one sub-core at one clock frequency while the TLB hit/miss detection and/or page fault detection may be performed in another sub-core at a different frequency. As such, certain errors and conditions can be detected earlier in the execution process than other errors and conditions.
FIG. 2 illustrates a block diagram of one embodiment of a processor 200 disclosed in the prior application which includes a generalized replay architecture to facilitate data speculation operations. In this embodiment, the processor 200 includes a scheduler 231 coupled to a multiplexor 241 to provide instructions received from an instruction cache (I-cache) 211 to an execution core 251 for execution. The execution core 251 may perform data speculation in executing the various instructions received from the multiplexor 241. The processor 200 as shown in FIG. 2 includes a checker unit 281 to send a copy of the executed instruction back to the execution core 251 for re-execution (replay) if it is determined that the data speculation is erroneous. However, in this generalized replay architecture, the checker unit 281 is positioned after the execution core 251, after the TLB and tag logic 261, and after the cache hit/miss logic 271. Some instructions may have been known to have been executed incorrectly (i.e., because data speculation is erroneous) earlier than this checker positioning would permit detection. Specifically, there are cases in which certain errors and conditions can be detected earlier which indicates that data speculation in these cases is erroneous even before the TLB/TAG logic 261 and the hit/miss logic 271 are executed. Unfortunately, because of the current positioning of the checker unit 281, the respective instructions that were executed incorrectly due to erroneous data speculation would not be sent back to the execution core 251 for re-execution or replay until they reach the checker unit 281. Thus, there is an unnecessary delay between the time when an instruction is known to have been executed incorrectly due to erroneous data speculation until the time when the respective instruction is actually sent back for re-execution. Thus, the system performance is not being optimized as much as it could have been had those instructions which were executed incorrectly been re-executed or replayed earlier in the process.
According to one aspect of the invention, a microprocessor is provided that includes an execution core, a first replay mechanism and a second replay mechanism. The execution core performs data speculation in executing a first instruction. The first replay mechanism is used to replay the first instruction via a first replay path if an error of a first type is detected which indicates that the data speculation is erroneous. The second replay mechanism is used to replay the first instruction via a second replay path if an error of a second type is detected which indicates that the data speculation is erroneous.