1. Field of the Invention
The present invention relates to a computer systems and processors. More specifically, the present invention relates to an interface apparatus and operating method for interfacing a bus to a processor.
2. Description of the Related Art
Advanced high-performance microprocessors, for example UltraSPARC-I and UltraSPARC-II from Sun Microsystems, Inc., have an on-chip data cache that is typically 128 bits (16 bytes) wide. To match the width of the data cache, which is also called a first-level or L1 cache, the microprocessor usually has a 128-bit data bus that interacts with a memory subsystem, either a second-level L2 cache or a DRAM memory, to supply requested data that are absent from the on-chip data cache during an operation such as a data cache read miss. The 128-bit data bus is very large, resulting in a high pin count that is generally excessive in size and cost for many computer system applications. For example, a 128-bit data bus realistically increases the size and cost of a microprocessor beyond the realm of the high-volume desktop computer market. A 64-bit data bus is much more advantageous and positions the microprocessor in the desirable markets.
However, a 64-bit data bus must be made compatible with the 128-bit cache line. The transferring of a 128-bit cache line over the 64-bit data bus typically is performed by segmenting the data into two parts. Segmenting or dividing of the data into two parts introduces several disadvantages. For example, segmenting the data inserts an extra delay into the timing pathways of the microprocessor. Furthermore, segmenting of the data in some systems leads to substantial rework of the first-level (L1) data cache organization. Segmenting of the data may also lead to a decrease in processor performance.
High-performance microprocessors such as the UltraSPARC-I and the UltraSPARC-II are 64-bit microprocessors so that only half of the 128-bit data returned from a memory subsystem is used to complete a computation. The other half of the 128-bit data is fetched to maintain locality of the two data halves since the second 64-bit data segment may be referenced subsequently so that the extra 64-bit bandwidth of the 128-bit cache line can be optimized for performance and cost.
FIG. 1 illustrates the 128-bit input data path 100 of an advanced high-performance microprocessor, specifically the UltraSPARC-I microprocessor. The input data path 100 serves as an interface between a memory 102 and-a CPU core 104. The memory 102 is connected to a 128-bit input data register 106 which buffers 128-bit data from the memory 102 for transfer to the 64-bit CPU core 104. The 128 bits of the input data path 100 are connected both in a direct pathway to the CPU core 104 and in a pathway through a 128-bit level-1 data cache 108.
The data cache write pathway is first applied to a 2-to-1 data cache multiplexer 110 that selects a 128-bit input signal from either the memory 102 via the input data register 106 or the CPU core 104. The CPU core 104 supplies the 128 bits in the form of two 64-bit store data elements. The data cache multiplexer 110 is controlled by a fill request signal that selects between a data cache fill operation for filling the data cache 108 from the memory 102 or a data store operation for storing data from the CPU core 104 to the data cache 108. The data input to the data cache 108 is in the form of 128 bit lines and 128-bit lines are stored in the data cache 108, but the data cache 108 divides the 128-bit lines into 64-bit segments and aligns the 64-bit segments for application to the 64-bit CPU core 104.
The direct pathway from the memory 102 to the CPU core 104 includes a CPU core multiplexer 112, a 2-to-1 multiplexer for selecting between the lower-order 64 bits and the higher-order 64 bits of the 128-bit data from the memory 102.
While microprocessors such as the UltraSPARC-I and the UltraSPARC-II are greatly advantageous for achieving high performance, the pin-count of an integrated circuit that supports a 128-bit data input is excessive for some computer system market segments. In particular, the 128-bit data input is highly suitable and advantageous for market segments such as the high-end workstation market. However, the markets for low-end workstations, home personal computers, desktop computers, and the like more cost-effectively use a 64-bit data input connection while maintaining a 128-bit data cache linesize.
What is needed is an apparatus and method for splitting input data from a databus into two segments and steering the segments to a processor core and a data cache while maintaining the timing of a system that does not split the input data.