1. Field of the Invention
The present invention relates generally to processing pipelines in a pipeline processor, and more particularly, to methods and systems for more efficiently processing swap operations.
2. Description of the Related Art
Microprocessors have used pipelines to organize processing of multiple instructions simultaneously. A pipeline processor can begin executing a second instruction before the first (preceding) instruction has been completed. Similarly, several instructions can be held in the pipeline simultaneously, each instruction being at a different processing stage (e.g, read, write, compare, calculate, etc.).
The pipeline is typically divided into segments and each segment can execute its operation concurrently with the other segments. Typically, each when a segment completes an operation, it passes the result to the next segment in the pipeline and fetches the next operation from the preceding segment. The final results of each instruction emerge at the end of the pipeline in rapid succession.
FIG. 1A shows a typical prior art processing window 100 used by a pipeline processor. The processing window 100 includes some number of register windows 110-117 (e.g., eight register windows reg0-reg7). An active register window 120 is also included. The active register window 120 can be accessed by the processor. By way of example, as the pipeline progresses, the contents of register window 113 will be restored to the active register 120 so that the contents of register window 113 can be processed by the processor. However, typically the active register 120 is not empty and therefore the data in the active register 120 must be saved to the appropriate register window (e.g., register window 114) before the contents of register window 113 can be restored to the active register 120. Therefore, the contents in the active register 120 must be swapped (i.e., a save operation to register window 114 and a restore operation from register window 113).
FIG. 1B shows a timing diagram 150 of the pipeline 100. A clock 152 signal controls the timing of the various operations. When a swap request occurs, then in a first clock cycle the data from active window 120 is saved to the appropriate register window (e.g., register window 114) and then next cycle next cycle the content from the desired register window (register window 113) is restored to the active window 120.
The swap requests typically operate in an acceptable manner as long as the swap requests do not occur too often. However, multiple swap requests often occur immediately following one another. As a result, the pipeline can stall. By way of example, if a first swap request (i.e., swap A) is received, then a save A operation 120A occurs, followed by a restore A operation 120B, in the next clock cycle. A second swap request (swap B 122) is received immediately after the swap A request is received. However, two clock cycles are required to complete the save A 120A and restore A 120B operations required for the swap A request, therefore the swap B request 122 must be delayed two clock cycles before the swap B request can be acted upon in respective save B operation 122A and restore B operation 122B. As a result, the swap B request 122 has a latency of 2 (two clock cycles between when swap B request was received and when restore B operation 122B was completed). If a swap C request 124 immediately followed the swap B request 122 (i.e., during the third clock cycle), then the swap C request would have a latency of 4 (i.e. four clock cycles between when the sway C request was received and when the respective save C operation 124A and subsequent restore C operation 124B was completed). Further, for an nth subsequent swap request 126 each swap request would have a n+1 latency factor. By way of example a tenth consecutive swap request (swap J) would have a latency of 11 (i.e., 11 clock cycles would pass between when the swap J request 126 was received and when the respective save J operation 126A and subsequent restore J operation 126B was completed). The latencies cause pipeline stalls where processing is delayed for that number of clock cycles.
In view of the foregoing, there is a need for a system and method for minimizing the number of stall cycles caused by multiple, subsequent swap requests.