The present invention relates to storage devices in computer systems and in particular, it relates to an improved method and system for operating system storage devices, and in particular to buffer devices which are used in the processor in a circulating manner.
Although the present invention has a broad field of application as improving or optimizing storage strategies is a very general purpose in computer technology and in particular in system architecture, it will be described and discussed with prior art technology in a special field of application, namely, in the context of operating a so-called instruction window buffer, (also called reservation station/reorder buffer in literature) further referred to and abbreviated as IWB herein, which is usually present in most modern computer systems in order to enable a parallel program processing of instructions by a plurality of processing units. Such processors are referred to herein as out-of-order processors.
In many modern out-of-order processors, IWB is used to contain all the instructions and/or register contents before the calculated results are committed. During commitment the IWB entry is cleaned and may be overwritten by a new instruction. When results were calculated speculatively beyond the outcome of a branch instruction, they can be rejected once the branch prediction becomes wrong just by simply cleaning these entries from the buffer and overwriting them with new correct instructions. This is one prerequisite for the out-of-order processing.
Often such buffers have 32, 62 or 128 entries defining a sequence between them. They are commonly used in a circulating way as a so-called wrap-around buffer where a sequence of valid entries overlaps the end of the buffer and continues with the beginning of it, i.e., wraps around its end to the beginning of the buffer.
One main parameter influencing the performance of the processors is the buffer size. A big buffer can contain many more instructions and results and therefore allows more out-of-order processing. This however stays in conflict with other design requirements such as cycle time, buffer area, etc. When, for example, the buffer size is dimensioned too large then the efforts required to manage a such large plurality of storage locations decreases the performance of said buffer. Often, such buffers are operated according to the FIFO principle: the first entry stored is the first to be put out for further processing.
In particular and with special reference to the present invention, a post-connected priority filter is used in prior art for finding the oldest entry which is ready for execution in the buffer for executing it next. This is necessary because the region in the buffer representing the sequence of valid entries to be operated on is not fixed to a constant array position during operation. The region can be symbolically compared to a worm. It consists of one piece without gaps between the entries, it moves, contracts, or expands during operation.
A prior art instruction window buffer as it is disclosed in U.S. Pat. No. 5,923,900, xe2x80x9cCircular Buffer With N Sequential Real And Virtual Entry Positions For Selectively Inhibiting N Adjacent Entry Positions Including The Virtual Entry Positionxe2x80x9d, incorporated herein by reference, is operated according to the following schemes. In order to manage the queue of instructions, the instructions are written in sequence into the buffer during the dispatch phase, further, they are executed out-of-order-issue phasexe2x80x94and written back into architectural registers in sequence againxe2x80x94commit phase. When all source data is available for it an instruction becomes executable which is indicated by a valid bit vb. Several instructions can be valid at a time, one, i.e. the oldest is selected by a priority filter referred to as instruction filter and abbreviated as if, and the data is sent to the instruction execution unit ieu. After the execution is completed the valid bit is overwritten by an execution bit ebn, the active state set to low, so that the next valid instruction can be selected. As an alternative the valid bit vb could be turned off after the data has been sent to the execution unit, instead of having an execution bit.
The prior art circular buffer has an xe2x80x9cactivexe2x80x9d window. Entries outside the window are ignored. This is done e.g., when the active window bit awb overwrites the valid bit vb to zero at the filter input. In the normal modexe2x80x94in contrast to wrap-around modexe2x80x94the next entry to be written has a higher number entry than the oldest entry in the active window. In wrap-around mode this is reversed. To handle both cases the filter searches the so-called unfolded data vector vb(0:127), whereby vb(0:63) is duplicated to vb(64:127). This virtual vector lets the oldest entry always have the lowest entry number in normal and wrap-around mode.
Focusing on the problem of the prior art, an architectural decision to increase the size of the window buffer would increase the probability of having a valid instruction in the active window, but also increases the size of the above mentioned priority filter which has to search the duplicated, i.e., unfolded data space when the wrap-around case must be covered. The filter itself becomes the performance limiting element when the IWB excesses a certain size. Thus, a great step forward to larger IWB memories requires a different technique for the priority filter for increasing the overall performance of instruction processing.
Briefly, a first feature of the invention comprises a method for determining the entry with the highest priority in a buffer memory. The method is characterized by the steps of operating a plurality of priority subfilter circuits each of them covering a disjunct subgroup of the total of entries and each selecting the entry with the highest subgroup priority, and selecting the entry associated with the highest priority subgroup.
Another feature of the invention requires a storage device which is able to be allocated and deallocated repeatedly during processing program instructions in a computer system. The storage device is characterized by an operator for operating a plurality of priority subfilter circuits each of them covering a disjunct subgroup of the total of entries and each selecting the entry with the highest subgroup priority, and a selector for selecting the entry associated with the highest priority subgroup.
Various other objects, features, and attendant advantages of the present invention will become more fully appreciated as the same becomes better understood when considered in conjunction with the accompanying drawings, in which like reference characters designate the same or similar parts throughout the several views.