Many current data processors are able to execute multiple instructions at one time by using many sets of execution units, particularly through the use of pipelining and superscalar issue architecture. Pipelined computers allow multiple instructions to exist in various stages of execution simultaneously. Superscalar computers use multiple instruction execution units and execute operations on available data. Additionally, data processor clock speeds are becoming increasingly faster, with many data processors operating in excess of 100 MHz. In microprocessors employing these attributes, controlling instruction flow inside the data processor is very important to achieve the highest possible performance. In particular, it is necessary to control the flow of instructions to the execution units where multiple instructions are available for execution and the execution units are in various states of availability. Maximizing performance requires determining which instructions are executable, which execution units are available, and minimizing the delay in passing the executable instructions to available execution units.
For the processors which can execute instructions speculatively, branch direction (taken or not-taken), or branch address (target address or address next to the branch instruction), can be predicted before they are resolved. Later, if these predictions turn out to be wrong, a central processing unit backs up to the previous state, and then begins executing instructions in the correct branch stream. In conventional processors, plural control branches are often ready to evaluate in one cycle; however, these conventional processors evaluate only a single branch prediction per cycle. Not-selected control transfer evaluations which are ready for evaluation must therefore be delayed. Delaying control transfer evaluations detracts from central processing unit performance and evaluating control transfers as early as possible can significantly improve performance.
An instruction set within a data processor usually contains a multitude of transfers of that instruction flow from a current pointer and a program counter located in the instruction flow to a new location. When the new location is the start of a sub-routine call or a piece of code that will be executed, there will be a need to return to the original instruction set from where the call was made. Generally, the subroutine is a short sequence of code to which the instruction flow calls on to execute and then returns to the main location of the code sequence.
When the instruction leaves the main code sequence from a first location on the program counter, it jumps to a second location on the program counter where the start of the desired sub-routine is designated. Once the sub-routine call is complete, the instruction flow must jump back, not to the first location, but to a third location which designates the next instruction in the main code sequence. Jumping back to the first location simply executes the same sub-routine call such as when a retry is needed for a default loop. The procedure of making the call to transfer the instruction flow and then waiting for its return to the main code sequence is one of the limiting factors on the execution speed of the microprocessor.
Specifically, the return is controlled through a register value which is read. This value is stored to a dedicated register of some type, depending on the microprocessor architecture. It may store a value equivalent to the program counter at the first location or to another location already advanced to the next instruction. This value functions as the return address stored in the register file which can be any file specifically stated to hold the location of the return from the program flow change. The execution is stalled waiting for that register to be read, waiting for that return address to be put back into the program counter, and waiting to continue the program execution.
Accordingly, it is desirable to provide an improved instruction flow that reduces the time required for the execution of returns from any program flow changes or controlled transfers, and further provides for efficient and rapid execution of program instructions.