Programs frequently feature subroutines which perform a specific task. After the task is performed, program flow returns from the subroutine to the main program. One common mechanism for performing a subroutine return involves conditionally or unconditionally moving the contents of a return address register into a program counter and then continuing program execution. A return value register may also be updated with a constant literal that may represent a Boolean value. Another approach to subprogram returns is to “pop” a return address from the stack and into the program counter and continue program execution from there. This operation may also pop any spooled-out register file contents from the stack into the register file.
These methods for performing subroutine returns take several cycles to execute. In FIG. 1, when a traditional return (“RET”) instruction is executed in a typical pipelined CPU, five cycles 10 are required to execute the instruction. In FIG. 2, a typical pipelined CPU contains a Program Counter (“PC”) 42 and an instruction memory 44. The CPU has four different pipeline registers 46, 52, 56, and 60 separating the different pipeline stages. The Instruction Decode stage (between registers 46 and 52) contains both a control/decode unit (“CU”) 48 for decoding the current instruction and generating control signals and a register file 50. The Execution Stage (between registers 52 and 56) contains an Arithmetic Logic Unit (“ALU”) 54. The Memory Stage (between registers 56 and 60) contains a data memory 58.) With continued reference to FIG. 1, during cycle 1, in the Instruction Fetch (“IF”) stage 12, the RET instruction is fetched (block 22). In cycle 2, in the Instruction Decode (“ID”) stage 14, correct control signals are generated and the return address register is read from the register file (block 24). In cycle 3, in the Execution (“EX”) stage 16, the return address register content is written through the Arithmetic Logic Unit (“ALU”) with no change (block 26). During cycle 4, in the Memory (“MEM”) stage 18, the return address register content is written past the data memory. Finally, in cycle 5, in the Writeback (“WB”) 20 stage, the return address register content is written to the Program Counter (“PC”) and the pipeline is flushed (block 30). Once the pipeline is flushed, the pipeline does not contain any instructions until the instruction at the return address is read from program memory. Therefore, several clock cycles are wasted in the pipeline flush process.
A similar issue exists for a return instruction (“RETMEM”) popping the return address register from a stack in memory. As shown in FIG. 3, in cycle 1, in the IF stage, the RETMEM instruction is fetched (block 32). During cycle 2, in the ID stage, the correct control signals are generated. In cycle 3, during the EX stage, the control signals to the data memory are routed past the ALU (block 36). In cycle 4, in the MEM stage, the return address is read from data memory (block 38). Finally, in cycle 5, in the WB stage, the return address read from memory is written to PC and the pipeline is flushed (block 40). As with the return instruction discussed in FIG. 1, several cycles are wasted after the pipeline flush.
It would be advantageous to provide a more efficient subroutine return operation.