The present invention relates to a method and apparatus for processing instructions in a computer. More specifically, the present invention relates to a method and apparatus for dynamically determining which of two or more caches contain load data and issuing the given load instruction to a functional unit coupled thereto.
Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline (also known as a functional unit) completes each instruction in a series of steps called pipeline stages. Instructions "enter" at one end of the pipeline, are processed through the stages, and "exit" at the other end (i.e., their intended effects are carried out). The throughput of the pipeline is determined by how often instructions are completed in the pipeline. The time required to move an instruction one step down the pipeline is known as a machine cycle. The length of a machine cycle is determined by the time required by the slowest pipeline stage because all the stages must proceed at the same time.
In this type of architecture, as in most, the chief means of increasing throughput is by reducing the duration of the clock cycle. Alternatively, systems may employ multiple pipelines to increase throughput, issuing instructions using a scheduler or similar hardware construct. Instructions may be issued to the pipelines based on numerous factors, such as pipeline availability, op-code type, operand availability, data dependencies, and other factors. Such architectures require instructions and data to be provided at extremely high rates to maintain a high level of utilization for the microprocessor's execution unit(s). To maintain these high data rates, designers commonly employ cache memories.
Cache memory exploits the "principle of locality," which holds that all programs favor a portion of their address space at any instant in time. This hypothesis has two dimensions. First, locality can be viewed in time (temporal locality), meaning that if an item is referenced, it will tend to be referenced again soon. Second, locality can be viewed as locality in space (spacial locality), meaning that if an item is referenced, nearby items will also tend to be referenced.
Another architectural feature implemented in some of today's microprocessor architectures is the use of multiple caches. Commonly, a delineation is made between the caching of instructions and data. Recently, specialized data caches have been included in certain microprocessor architectures to allow for the storage of certain information related on the basis of various characteristics, such as repetitive use in floating point or graphics calculations.
When used in combination with multiple pipelines, which too may be specialized, it is desirable to group instructions on the basis of the ability to execute each instruction in a group simultaneously. By grouping instructions, such an architecture insures maximum utilization of its facilities (e.g., pipelines) and so maximizes throughput.
What is often required in such microprocessor architectures is the ability to dynamically allocate instructions to functional units for the instructions' subsequent execution. For example, when two instructions are to read information from two caches simultaneously (i.e., the caches are accessed within the same time slice), each instruction must be allocated to a particular functional unit (i.e., pipeline) having the requisite facilities for execution.
Preferably this allocation is performed in such a way as to maximize the number of instructions per group. Each group should therefore contain a maximum number of simultaneously executable instructions, thereby insuring the maximum utilization of the microprocessor's facilities. Because the allocation must be done dynamically (i.e., as the program executes), it must not require any significant processing between the time that the instruction is fetched and when it is issued to a functional unit.