1. Technical Field
The present invention generally relates to a design structure for microprocessors and in particular to a design structure for enhancing operations within a microprocessor.
2. Description of the Related Art
A microprocessor is a digital device that executes instructions specified by a computer program. A typical computer system includes a microprocessor coupled to a system memory that stores program instructions and data to be processed by the program instructions. One of the primary steps in executing instructions in a microprocessor involves fetching instructions from a cache. The majority of microprocessors possess caches which store instructions and allow rapid fetching of those instructions without having to access the main memory. As microprocessors become smaller and faster there is a need to improve the efficiency of the instruction fetch.
Several problems exist with the current method of instruction fetch from the instruction cache of a microprocessor. As an example, backward taken branch loops such as “for” loops and “while” loops, are common short loop constructs that frequent the instruction cache (I-cache). The for loop allows code to be executed repeatedly, often executing for a definite number of loop counts. While loops, also executing repeatedly, are conditional and based on the outcome of a sequential instruction. For each of the backward taken branch loop commands and the corresponding repeats, the I-cache is accessed repeatedly, even though the entire loop resides in the instruction buffer (IBUF).
Frequently accessing the I-cache with for and while loops, also known as short loops, increases device power consumption. As devices become smaller and more portable, lower power consumption is an important factor in microprocessor design. Repeated utilization of the I-cache for short loops increases energy consumption.
Repeated access to the I-cache for short loops may also cause instruction delays. For example, during an instruction fetch, delays may occur if the instruction cache is busy. Also the fetch logic must arbitrate to access the I-cache, whether there is one or multiple threads. In all these cases, increased latency can significantly degrade the efficiency of the multiprocessor.