1. Field of the Invention
The invention pertains generally to a program return stack in a computer. In particular, it pertains to a stack for handling the target addresses of return instructions when some of the target addresses may be speculative.
2. Description of the Related Art
Computers predominately execute instructions in a linear fashion. But occasionally program execution must be temporarily diverted to a subroutine in another portion of memory to execute a particular function, only to return to the point of diversion when that function is complete. Such an operation is generically referred to as a call and return sequence. The call includes the process of branching away from the normal linear program execution, and the return includes the process of returning to the point from which execution was diverted, to resume execution from that point.
Call and return sequences can be nested, so that a second call can be triggered before the first return has completed, and a third call can be triggered before the second return has completed. The third return is then completed before the second return, and the second return is completed before the first return. Multiple levels of nesting are thus permitted, with each level returning to it""s associated point of diverted execution.
Push-and-pop stacks, also referred to as last-in-first-out (LIFO) buffers, are used to organize the nesting of call and return operations. In such a stack, each time a call is initiated, the address of the point of diversion is placed on top of the stack and all previously entered return addresses are xe2x80x9cpushed downxe2x80x9d. When the return is performed, using the address at the top of the stack as a return address, that address is removed from the top of the stack and discarded, while the most recent address to be pushed down is xe2x80x9cpopped upxe2x80x9d to the top of the stack. By using this push and pop operation for return addresses, the returns are executed in reverse order of the calls.
This relatively simple operation becomes more complicated when instruction pipelines and predictive branching are used to increase overall processing speed. Both techniques are commonplace in computer architecture. Instruction pipelines take advantage of the fact that the execution of each instruction is a series of predefined sequential operations. The pipeline contains several successive stages, with each stage performing one of those operations. Each instruction is fed into the pipeline, and passed from stage to stage for each successive operation. In this way, multiple instructions can be in the pipeline at the same time, with each in a different stage at any given time.
Predictive branching is required when conditional branch instructions go through the instruction pipeline. A conditional branch instruction will branch to one set of instructions if a condition is met, but will go to a different set of instructions if the condition is not met. Frequently the condition is not determined, and the correct set of subsequent instructions is therefore not known, until immediately before the conditional branch instruction is to be executed. This creates a dilemma, since the instructions following the conditional branch instruction are already in the pipeline before it is known if those following instruction are the correct ones. Simply refusing to feed instructions into the pipeline until they are firmly identified creates intolerable delays in processing. To avoid this, various methods have been developed to predict which set of instructions is likely to be the correct one, and then feed that set into the pipeline. If the prediction is incorrect, at least a portion of the pipeline will be flushed and refilled with the correct instructions, resulting in a delay. But if the predictive method is sufficiently accurate, the occurrence of pipeline flushing will be low, resulting in minimal slowdowns in processing.
Since a call operation can be in one of the two possible branch paths, placement of the associated return address on the stack is speculative, and that return address is removed or invalidated if the predicted path proves to be incorrect.
Although conventional terminology conceptually refers to the xe2x80x9ctopxe2x80x9d of a stack, this is merely the defined entry/exit point for the return address data. FIG. 1 shows a conventional stack 1, which places return data into sequential locations N through N+n of an addressable storage space 5, and uses a pointer in a separate register 3 to point to the address containing the top of the stack. The pointer is then incremented or decremented to shift the top of the stack as the push and pop operations are performed, while the data itself is not shifted. To handle branch predictions, two versions of the stack are maintained. A predictive version includes return addresses for predicted branch paths, and this version feeds an early stage of the instruction pipeline. An architectural version includes return addresses that have been verified as correct, and this version feeds into a later stage of the instruction pipeline.
The use of a pointer introduces an extra step, and an associated delay, into the process of identifying the entry/exit point of the stack. Maintaining two copies of the stack requires additional logic, occupying more space on the integrated circuit die. Both of these results are detrimental to the overall cost and efficiency of the computer.
An embodiment of the invention includes an apparatus with a bidirectional register stack having multiple registers coupled together. The registers include an entry/exit register positioned in the interior of the register stack. The embodiment also includes a history register, a history depth counter circuit, and a control circuit coupled to the register stack, the history register, and the history depth counter circuit.