1. Field of the Invention
This invention relates to the field of microprocessors and, more particularly, to pipeline partitioning techniques employed within microprocessors and to multi-chip microprocessor modules.
2. Description of the Relevant Art
Microprocessors are a key component in computer systems. Generally, the microprocessor is the master in a computer system, controlling other components according to a sequence of instructions provided by a user. The sequence of instructions is referred to as a program. Because the microprocessor is the master, many times the performance of the computer system is characterized by the performance of the microprocessor. As a result, microprocessor manufacturers have made dedicated efforts to increase the performance of their microprocessors.
One common performance improvement technique implemented in both scalar and superscalar microprocessors is "pipelining". Pipelining involves dividing a complex function that needs to be performed upon an object into a collection of sequential, independent tasks. Each of the independent tasks can then be assigned to a location where that task would be performed upon any object moved to that location. A "pipeline" is defined as the collection of locations and the tasks performed at those locations. A "pipeline stage" can then be defined as a location within the pipeline. By implementing the independent tasks in separate locations, the complex function may be performed on multiple objects simultaneously. No single performance of the complex function occurs more quickly, but the aggregate amount of time necessary to perform the task on multiple objects decreases.
Objects enter a pipeline and flow through the pipeline stages. When an object is moved out of a pipeline stage, another object may move into that pipeline stage. In order for a pipeline to function, it is necessary that objects move from pipeline stage to pipeline stage simultaneously, so that at no time is a given pipeline stage expected to retain a previous object and accept a new object. Additionally, no object is permitted to leave a pipeline stage until that pipeline stage has completed performing its task on the object. The two above mentioned requirements lead to the assignment of a fixed time interval to a pipeline. As each time interval expires, the objects within the pipeline move to the next pipeline stage. Therefore, the time interval for a particular pipeline is required to be at least as large as the largest amount of time required to execute a given independent task. In a microprocessor, the time interval is defined by a clock signal which opens and closes registers that define the pipeline stages and other storage locations within the processor. A "register" is a storage device that is directed by a clock to accept new values at regular intervals. A certain type of register known as a single phase register opens when the clock signal makes a transition, and closes a short time later. During the time that the register is "open", it accepts a new value. During the time that the register is "closed" it retains the value that it accepted when it was last opened. Accepting a new value into a register is referred to as "sampling" a value. The time interval in a microprocessor is referred to as a "clock cycle".
The simultaneous movement of objects between pipeline stages is referred to as "advancing" the pipeline. Sometimes, a pipeline is defined having situations where the pipeline cannot advance in some time intervals. Such a situation may exist, for example, when two pipeline stages share a resource that is occasionally used by one stage or the other. In cases where the resource is needed by both pipeline stages, then one stage will use the resource in a time interval, the pipeline is not advanced at the end of the time interval, and then the other stage uses the resource. At the end of the second time interval, the pipeline advances. Not advancing the pipeline at the end of a time interval is referred to as "stalling" the pipeline. Pipeline stages are connected in the order that the associated tasks are performed. A pipeline stage that receives an object before that object passes to a second pipeline stage is said to be "upstream" of that second pipeline stage. A pipeline stage that receives an object after a second pipeline stage has received the object is said to be "downstream" from that second pipeline stage.
A problem occurs when pipeline stages cannot be assigned independent tasks that require similar time intervals. The time interval must be set equal to or greater than the largest amount of time required to perform any of the independent tasks implemented in the pipeline. Therefore, the stages requiring less than the time interval to complete their task idle for the remainder of the time interval. In some cases, the set of pipeline tasks actually requires more time to execute than the complex function would require if implemented in a single step. Among the reasons why the complex function can be faster than the pipelined implementation of the complex function are that idle times exist for stages that complete their tasks in less time than the allotted time interval, and that a finite amount of time is required to advance the pipeline. Therefore, each added stage increases the amount of time required to complete the complex task on a single object. In some cases, a task associated with a pipeline stage can be further divided into tasks that can be implemented in separate stages. Such a division is desirable in cases where the task to be divided is the task that determines the necessary time interval, and the remaining tasks in the pipeline require significantly less time to complete. In other cases, however, a task cannot be naturally divided.
A particularly important application of pipelining in microprocessors is the processing of instructions. In order to process an instruction, a complex set of functions must be performed: the instruction must be retrieved from memory ("fetching"); the operations required by the instruction must be determined ("decoding"); the instruction must be transferred to an execution unit ("dispatching"); the operations required by the instruction must be performed ("executing"); and the results of the operations must be recorded ("writeback"). If these functions are performed separately in a non-pipelined fashion for each instruction in a program, the time required to process all the instructions in the program would be large. However, if the tasks are divided into stages in which each stage requires a similar amount of time to perform its assigned task, then the processing of instructions may overlap each other.
Another important consideration with respect to the design of a microprocessor relates to the size of the semiconductor die upon which the microprocessor is fabricated. With every generation of typical microprocessor families, the die size and number of transistors has increased tremendously. The increased number of transistors is required to implement the ever-increasing functionality supported by the typical microprocessor, as well as to implement the integration of other subsystems which are closely coupled to the microprocessor.
Unfortunately, increases in die sizes can result in difficulties in fabrication and in lower yields in manufacturing. To mitigate these problems, one approach has accordingly involved separating the functionality of a microprocessor system into smaller chips for incorporation within a multi-chip module. Specifically, in this approach a secondary cache is implemented on a chip separate from the main processor, and smaller primary caches (i.e., small data and instruction caches) are incorporated on the main processor chip upon which the decode and execution circuitry are also fabricated. By separating the functionality in this manner, a relatively large secondary cache is made possible, thereby enhancing performance.
Since in the above-described partitioning scheme the smaller primary caches are incorporated on the same die as the main processor circuitry (including the decode and execute circuitry), the overall sizes of these caches must be kept relatively small to allow for a reasonable overall die size. Similarly, the capabilities and functionality associated with the other processor circuitry on the main processor die must also be limited to keep the die within a reasonable size. Thus, while significant enhancements to the overall performance of a processor could be obtained by providing enhancements to the circuitry associated with the main processor die, in many instances such enhancements are not exploited due to die size limitations. In addition, although the size of the die containing the secondary L2 cache could be increased since the die space needed to implement the L2 cache itself is typically somewhat smaller than the practical limitations associated with a maximum die size, such additional die space can typically not be used to implement functionality of the main processor chip due the large number of required interconnections which would be required between the chips.