Buffers structures and their associated control logic are commonly used to absorb data delivery and consumption bandwidth discontinuities between a data generator and a data consumer. Examples of commonly implemented buffer structures include but are not limited to first-in-first-out (FIFO), last-in-first-out (LIFO) and stacks.
These buffer structures have implicit “costs” associated with their use, in addition to the obvious die area and power consumption of the structure itself. Usage of these buffer structures implies usage of system wide resources to make the buffer's effective. Common examples of system resources which a buffer implementation requires are system memory bandwidth to fill the buffer and local bandwidth to empty the buffer. This system and local memory bandwidth is commonly not available to other circuits if it is consumed by the buffer structure. Therefore consumption of this bandwidth is an associated cost of the buffer structure. There is also an associated power cost with the use of system and local memory bandwidth. Examples include power consumed by internal and external bus drivers to access main memory to fill the buffer structure, the power consumption of the main memory circuits themselves due to read and write access initiated by the buffer structure control logic and consumption of power by the clock and control circuits required to enable main memory access.
Buffer structure size is typically calculated by accounting for the absolute worst case scenario to determine the maximum size of the buffer. For example, the instantaneous absolute maximum difference between the data consumption rate and the data production rate will set the buffer size for any given implementation. Other examples include arbitration latency for system memory access, data path widths, interrupt servicing, etc. The buffer structure sizing must account for all possible scenarios, not matter how infrequent or unlikely, otherwise the system will fail to operate correctly in actual usage.
In some cases it is possible to predict the occurrence of these worst case scenarios in advance of their impact on the buffer itself and therefore reduce the associated implicit costs. An example of this is a system which implements a buffer structure to support bandwidth smoothing between an instruction fetch circuit and an instruction decode circuit. If this same system supports multiple, but exclusive, instruction size formats (i.e. instruction size can be set only during reset), then the buffer structure will be sized for the worst case bandwidth discontinuity in one of the instruction size formats. This buffer size will not necessarily be optimal for the alternate instruction size format. Yet the buffer structure will consume the same amount of system resources in both instruction size modes because the design has been constrained by the worst case scenario.
In FIG. 1 there is shown a prior art 2-way scalar pipeline in a multi-scalar design. Branch prediction and sequential address generation occurs in “A0”. Branch resolution occurs nine stages later in “E3”. The resolution determines if the prediction was correct or not. If a branch predicts incorrectly, pipe stages F1 thru E2 must be flushed and all of the instruction must be discarded. Power is consumed by each of the stages and flushed instructions represent wasted power since these instructions have been fetched and staged but never executed. In this example, power consumed by stages F2 through E2 increases linearly with increased pipe depth and parallelism. In this context an N-scalar design consumes N times the pipeline power of a 1-scalar design. Deeper pipelines consume power proportional to the increased pipeline depth.
In FIG. 2, a prior art diagram highlighting queue positions is shown. Buffers such as queues are used to smooth bandwidth discontinuities between various stages of a circuit. As mentioned previously, queues are typically sized in view of worst case scenarios. For processors that support multiple instruction sizes (e.g., ARM or Thumb processors designed by ARM, Ltd., x86 family processors designed by Intel, Inc., etc.) queues will be sized for worst case conditions for the largest instruction size. Historically buffer sizes have increased as design frequency increases, further exacerbating the problem.
Oversized queues result in wasted power on branch misdirection due to the fetching and staging of un-executed instructions. Processors with multiple instruction set sizes have over sized queues for the smaller instruction set sizes. This is due to the fact that larger instruction set size requires more storage in order to maintain the same performance target.