1. Field of the Invention
This invention relates to microprocessor design, and more particularly to a system, a circuit, and a method for adjusting prefetching rates by determining a probability factor for a type of branch instruction when the branch instruction is encountered in a set of instructions.
2. Description of the Related Art
The following descriptions and examples are not admitted to be prior art by virtue of their inclusion within this section.
Over the years, the use of microprocessors has become increasingly widespread in a variety of applications. Today, microprocessors may be found not only in computers, but also in a vast array of other products such as VCR's, microwave ovens, and automobiles. In some applications, such as microwave ovens, low cost may be the driving factor in the implementation of the microprocessor. On the other hand, other applications may demand the highest performance obtainable. For example, modern telecommunication systems may require very high speed processing of multiple signals representing voice, video, data, etc. Processing of these signals, which have been densely combined to maximize the use of available communication channels, may be rather complex and time consuming. With an increase in consumer demand for wireless communication devices, such real time signal processing requires not only high performance but also demands low cost. To meet the demands of emerging technologies, designers must constantly strive to increase microprocessor performance while maximizing efficiency and minimizing cost.
With respect to performance, greater overall microprocessor speed may be achieved by improving the speed of devices within the microprocessor's circuits as well as architectural development that allow for optimal microprocessor performance and operations. As stated above, microprocessor speed may be extremely important in a variety of applications. As such, designers have evolved a number of speed enhancing techniques and architectural features. Among these techniques and features may be the instruction pipeline, the use of a branch prediction scheme, and the concept of prefetching.
A pipeline consists of a sequence of stages through which instructions pass as they are executed. In a typical microprocessor, each instruction comprises an operator and one or more operands. Thus, execution of an instruction is actually a process requiring a plurality of steps. In a pipelined microprocessor, partial processing of an instruction may be performed at each stage of the pipeline. Likewise, partial processing may be performed concurrently on multiple instructions in all stages of the pipeline. In this manner, instructions advance through the pipeline in assembly line fashion to emerge from the pipeline at a rate of one instruction every clock cycle.
The advantage of the pipeline generally lies in performing each of the steps required in a simultaneous manner. To operate efficiently, however, a pipeline must remain full. For example, the pipeline scheme tends to work best on linear code, where instructions are consecutively fetched and processed. However, if the flow of instructions in a pipeline is disrupted, clock cycles may be wasted while the instructions within the pipeline are prevented from proceeding to the next processing step. In particular, most programs executed by a processor contain at least one non-linear code sequence. Such sequences may include instructions, such as branches, that may cause the processing of instructions (i.e., the program flow) to flow out-of-order and slow down the pipeline. Therefore, it would be advantageous to devise a scheme in which to overcome the effects of executing non-linear code sequences (such as branch instructions) within a pipeline.
Branch instructions may be classified into several different types. A “branch instruction,” as described herein, is an instruction that typically diverts program flow from a current instruction to a target address within or outside of the current instruction set. A first type of classification is the direction in which the program flow is diverted by the branch instruction. If program flow is diverted to a target address, which is greater than a source address containing the branch instruction, the branch instruction may be referred to as a “forward branch instruction.” Conversely, if the target address is less than the source address, the branch instruction may be referred to as a “backward branch instruction.”
A second type of classification is whether a target address is provided along with the branch instruction. In the case of an unconditional branch instruction, the target address is provided and there is no ambiguity as to where the microprocessor should go to fetch the next set of instructions (pointed to by the target address). As such, unconditional branch instructions do not impede performance, since the microprocessor may continue to read instructions from the target address and process the instructions within the pipeline. In some cases, the unconditional branch may be further classified by determining the direction in which program flow is diverted by the branch instruction.
In the case of a conditional branch instruction, however, the target address is not provided and needs to be generated. Unfortunately, the generation of a target address may result in a time penalty, as the generation may stall the pipeline and not allow for other instructions to be processed. Such a time penalty may hinder the performance of the pipeline, and consequentially, the overall performance of the microprocessor.
In an effort to avoid the time penalties associated with conditional branch instructions, a branch prediction scheme may be used to determine the most likely target address. If the prediction is accurate, the microprocessor proceeds at full speed and no performance penalties are assessed. However, if the prediction is inaccurate, the microprocessor must cancel the instructions being executed and take another branch to find the right target address. The process of predicting a target address may result in unnecessary execution of instructions, which the microprocessor may not need; therefore, resulting in an increase of power consumption and a decrease in microprocessor performance. Therefore, a need remains for a more complex prediction scheme.
One example of a more complex prediction scheme is to determine the target address dynamically. For example, a microprocessor can maintain a history table, implemented in hardware, that records the results of previous conditional branch instructions. The results of the previous conditional branch instructions (i.e. the target address) may help the microprocessor to predict the outcome of a conditional branch instruction encountered within a set of instructions. In the dynamic branch prediction scheme, the working assumption is that similar branch instructions may have the same target address. However, one disadvantage of the dynamic branch prediction scheme is that it requires specialized and expensive hardware.
Another advancement in microprocessor technology relates to the concept of prefetching information, where such information may either be data or instructions. FIG. 1, for example, illustrates one embodiment of the prefetch concept. As illustrated in FIG. 1, prefetch unit 106 may request a block of information by transmitting one or more instruction addresses 110, via memory bus 108, to memory controller 116 of memory device 114. In some cases, memory device 114 may be an external memory device having a relatively high order in the memory hierarchy. Memory controller 116 may retrieve the block of information from memory space 118 and may transmit retrieved instructions 112, via memory bus 108, to processing unit 102. A “processing unit,” as described herein, is typically a microprocessor, but may alternatively encompass any circuitry adapted to execute instructions. Subsequently, instructions 112 may be written to a storage device lower in the memory hierarchy, such as a lower order level of cache memory device 104. Prefetching may allow the time spent retrieving the block of information to occur concurrently with other actions of processing unit 102. Thus, when the processing unit 102 requests the prefetched information, there may be little or no delay in having to fetch the information from a nearby cache.
As such, prefetching involves a speculative retrieval of information, where the information may be retrieved from a higher-level memory system, such as external memory device 114, and placed into a lower level memory system, such as cache memory device 104. Such a retrieval may be executed under the expectation that the retrieved information may be needed by the processing unit for an anticipated event at some point after the next successive clock cycle.
In some cases, processing unit 102 may include an internal, or on-chip, cache memory device 104, as illustrated in FIG. 1. An internal cache, often called a primary cache, may be built into the circuitry of the processing unit. Processing unit 102 may further include internal prefetch unit 106, which may be coupled to the internal cache memory device via an internal bus. In other cases, however, cache memory device 104 and prefetch unit 106 may be external devices coupled to processing unit 102 via an external bus (not shown). The advantages and disadvantages of including internal versus external devices are well known in the art; thus, only internal devices are illustrated in FIG. 1 for the purpose of simplicity.
Several types of prefetching are known in the art. The most common example of a prefetch may be performed in response to a load operation. A load may occur when the processing unit requests specific information to be retrieved, so that the processing unit may use the retrieved information. In another example, a store operation may prefetch a block of data, so that a portion of the block may be overwritten with current information.
Another form of prefetching may occur for certain instructions, such as a branch instruction. For example, branch instructions may involve a prefetch unit to stop fetching instructions sequentially, and to divert fetching to the instructions within a target address associated with the branch instruction. As noted above, the branch instructions may be unconditional (i.e., where a target address is provided) or conditional (i.e., where a target address is not provided). In either case, the processing unit may need the instructions associated with the target address before the processing unit may continue to fetch the remaining set of instructions sequentially. Therefore, a prefetch operation may be performed so that the instructions of the target address are more readily accessible for processing after the branch instruction is fetched. By prefetching instructions during a time in which the processing unit is occupied with other processing, the speed of the processing unit may be increased by ensuring the availability of subsequent instructions before the processing unit requests the instructions.
Though prefetching, according to the manners described above, provides the benefit of improved microprocessor performance, the present inventor has recognized various drawbacks resulting from techniques, which attempt to use branch prediction and prefetching schemes in a pipelined system. The discussion of such drawbacks is presented below along with various embodiments that reduce the effects of such drawbacks and improve upon the prior art.