The invention relates to computer systems and in particular to a computer processor for processing loop branch instructions.
The present application is related to the following co-pending patent application filed concurrently herewith:
U.S. Patent application entitled METHOD AND SYSTEM FOR ALTERING PROCESSOR EXECUTION OF A GROUP OF INSTRUCTIONS, attorney docket POU920030070US 1.
This co-pending application and the present application are owned by one and the same assignee, International Business Machines Corporation of Armonk, New York. The description set forth in this co-pending application is hereby incorporated into the present application by this reference.
Trademarks: IBM® is a registered trademark of International Business Machines Corporation, Armonk, N.Y., U.S.A. S/390, Z900 and z990 and other product names may be registered trademarks or product names of International Business Machines Corporation or other companies.
Advanced microprocessor designs employ pipelining techniques that divide instruction fetching, decoding, cache access, execution, etc. into separate pipeline stages. The frequency of the processor depends on how much logic exists in each stage. Deeper pipeline designs can be operated at higher frequencies but with increased instruction latencies. Branches, representing control points in a program, are commonly executed instructions that make up to 20–25% of the dynamic instruction count in typical workloads.
Designers of computer systems try to compute the branch resolution (i.e., taken vs. not taken) as early as possible in the processor pipeline to minimize the amount of speculative execution and to minimize branch penalties in case of executing instructions on the wrong path. The speculation on the branch path, usually referred to as “branch prediction” is decided based on a history table of the previously executed branches and other global factors.
The branch prediction logic is typically at the front end of the pipeline steering instructions to be executed down the pipe. However, the branch resolution is determined at bottom end of the pipeline typically in the execution stage. Since the resolution of these branches occurs late in the pipeline, an accurate branch prediction becomes crucial for performance. Loop branches (i.e., a branch instruction present in a code loop) are often taken (branch is taken in most cases) but prediction is usually wrong during the last loop when the count value reaches a value of zero. A wrong prediction results in a “flush” of the pipeline and to eventual performance degradation.
Branches usually test a set of conditions and resolute on the presence or absence of the conditions. RISC-type architecture computers usually branch on a certain value of a condition code or in some cases work (decrement) on dedicated count-registers to minimize dependencies on these registers. CISC-type computers have complicated branches that do some arithmetic computation on general purpose registers (GPR) and resolute on the result. Resolution on branches in CISC computers is usually made during the instruction execution pipeline stage since branches are typically dependent on operands that are read from the GPR array, or on results of proceeding instructions.
A subset of branches known as “loop branches” are common in most architectures. They are used as counters in program loops decrementing the counter value in each iteration until the counter reaches a value of zero. Loop branches usually require two cycles of execution; one to do the arithmetic operation and the other cycle to compute the branch resolution and send the information back to the instruction fetch pipeline stage.
Existing processors require two cycles to resolve loop branch instructions. During the first cycle, the operand is decremented by “1” and the result is stored in a storage register. During the second cycle, the branch resolution is calculated based on the result being zero or not and information is sent to branch prediction logic. Further, in existing processors, the branch prediction logic does not distinguish between the last iteration of a loop branch from the rest of the iterations, often resulting in wrong branch prediction during that last iteration.