1. Field of the Invention
The present invention relates to microprocessor transactions, and more particularly to an apparatus and method for ordering transaction beats in a data transfer which solves the problem of stall cycles incurred by a microprocessor due to non-optimum ordering of cache line reads.
2. Description of the Related Art
In a present day microprocessor, such as an x86-compatible microprocessor, transactions (i.e., read and write transactions) to/from memory are accomplished over a system bus. These transactions include a request phase and a data (i.e., response) phase. During the request phase, an address for a transaction along with the transaction type are provided over an address signal group. The address signal group typically includes an address bus, a set of corresponding address strobe signals, and a request bus. During the data phase, data corresponding to the transaction is transferred over a data signal group. The data signal group typically includes a data bus, a set of corresponding data strobe signals, a response bus (indicating the type of response), and bus control signals. In one particular conventional configuration, the data signal group includes about 72 or so signals. Many conventional configurations support “quad-pumped” transactions in which an entire cache line (e.g., eight quadwords for a 64-byte cache line) is transferred across the bus in just a few cycles (e.g., two clock cycles) of a bus or system clock. During this type of transfer, data strobe signals are provided to indicate the validity of various quadword beats on the data bus so that several beats are transferred during each bus clock cycle.
In an x86-compatible processor, the request phase consists of two sub-phases: Request A and Request B. During the Request A sub-phase, the address of the transaction along with the transaction type is put out over the address signal group. During sub-phase B, other data associated with the transaction, such as the attribute of the transaction (e.g., write combined write to memory) and its length, are put out over the address signal group.
On loads (i.e., a data read request), the critical quadword (i.e., the quadword whose address is provided during the request A phase over the address signal group) is transferred during the first beat A, and the remaining quadwords are ordered for the remaining beats B-H according to interleaved ordering protocol. Interleaved ordering of quadwords for transfer of a cache line from memory is an artifact of older memory configuration schemes that enabled every other quadword (or whatever size data entity—e.g., doubleword—according to bus architecture) to be fetched from an alternate DRAM bank, thereby precluding wait states that were normally associated with fetching two consecutive addresses from the same DRAM bank. Albeit that wait states were precluded in older DRAM designs by using interleaved ordering, DRAM improvements have enabled system designers to provide for other types of ordering, such as linear ordering as described hereinbelow.
Today's state of the art for burst transfers over a data bus allows for only a single type of transfer order. For example, one processor configuration allows for interleaved ordering while a different processor configuration allows for linear ordering. And the present inventor has observed that in the majority of cases, linear ordering is optimal from the standpoint of data proximity. Accordingly, a linearly ordered system bus provides for transfer of data in a manner that minimizes processing stalls due to cache line reads. But while linear ordering may be optimal in many cases, it is very detrimental (i.e., numerous pipeline stalls are incurred) in other cases, such as when the critical quadword is the last quadword rather than the first. From the standpoint of data proximity, linear ordering maximizes the number of stalls, whereas interleaved ordering may provide superior performance.
Consequently, it is desirable to provide a protocol mechanism which allows for data entity transfer ordering to be specified dynamically as part of a request phase for a cache line read. It is furthermore desirable to provide apparatus and methods that enable dynamic specification of transfer order, while remaining compatible with existing and legacy bus protocols. Furthermore, it is desirable to provide a technique for specifying a custom data entity transfer protocol that can be dynamically specified for a cache line or other type of transfer.