The present invention relates to the field of bus arbitration in computer systems. More specifically, the present invention relates to a system for and a method of re-ordering data transactions within a computer system.
Much of a computer system""s functionality and usefulness to a user is derived from the functionality of the peripheral devices. For example, the speed and responsiveness of the graphics adapter is a major factor in a computer system""s usefulness as an entertainment device. Or, for example, the speed with which data files can be retrieved from a hard drive determines the computer system""s usefulness as a database system. Hence, the rate at which data can be transferred often determines whether the computer system is suited for a particular purpose. The electronics industry has, over time, developed several types of high-performance peripheral bus and memory bus architectures, such as PCI (Peripheral Components Interconnect), AGP (Advanced Graphics Port), etc., which is capable of transferring data at a very high rate.
Peripheral bus and memory bus architectures can be generally classified into three main categories: non-pipelined, pipe-lined and split-bus. Devices such as those conforming to the bus definition of AMBA, PCI etc. perform data transactions in a non-pipelined manner. These devices generally share pins between address and data. In non-pipelined systems, a new data transaction will not start till the successful completion of the prior one. Devices that have separate address and data pins and can issue a new address cycle without waiting for the completion of the previous data phases perform data transactions in a pipe-lined manner. However, these devices impose the restriction that the data phases should completed in the exact same order as the addresses were issued. Some exemplary devices that are capable of performing pipelined data transactions are Pentium(copyright) processors, PowerPC(copyright) 604 processors and MIPS(copyright) SH-2 processors, etc.
In split-bus devices, data transactions are run in a split transaction fashion where the requests for data transfer (or access requests) are xe2x80x9cdisconnectedxe2x80x9d from the data transfers. That is, in operation, a master device initiates data transaction with an access request. Then, memory controller or core logic responds to the access request by directing the corresponding data transfer at a later time. The fact that the access requests are separated from the data transfers allows split-bus devices to issue several access requests in a pipelined fashion while waiting for the data transfers to occur. In addition, data transactions may be completed out-of-order. That is, data transactions may be completed in an order different than the one in which the addresses were issued. These features significantly increase the performance of split-bus devices over non-pipelined and pipelined devices.
Current high-performance memory devices such as SDRAM (Synchronous Dynamic Random Access Memory), RAMBUS(copyright) RAM, etc. have different latency numbers for cycles based on whether the request is to previously opened pages within the memory or not. Given a certain number of pending requests, the maximum bandwidth achieved will depend on the order in which the cycles are issued to the memory. For instance, if the memory controller can re-order the transactions, maximum performance would be achieved if the memory controller re-orders memory access transactions such that accesses to the same pages can be clustered together.
In a split transaction interface where the data transactions can be re-ordered, certain events may occur that force the reordering to be limited to either before or after the event. These events dictate that any re-ordering should not cross over the synchronizing event. For example, transactions sampled prior to one such event must be completed before transactions sampled after the event are completed. By the same token, transactions sampled after the event must be completed before transactions sampled before the event are completed. Examples of such synchronizing events are address only cycles like sync or eieio in PowerPC(copyright). The events could also be requests from agents like PCI that require snooping on the CPU (central processing unit) bus prior to being serviced by memory. For clarity, these events are called synchronizing events, synchronizing requests or alignment events herein.
In the prior art, the master device (e.g., device initiating the data transaction) must stop issuing data transaction requests once a synchronizing event occurs. The master device may resume issuing data transaction requests until all outstanding data transaction requests have been completed. Consequently, data transfer rate would be decreased by the occurrence of such synchronizing events. Impact on the data transfer rate can be severe if these synchronizing events occur frequently.
Therefore, what is needed is a system and method for handling synchronizing events without stalling the master devices such that maximum performance benefits can be gained.
Accordingly, the present invention provides an interface and process for handling out-of-order data transactions and synchronizing events while maximizing re-ordering to gain maximum performance benefits. The present invention is applicable to target devices that interface to master devices such that both masters and slaves are capable of handling the re-ordering of outstanding requests.
In one embodiment, the method of the present invention is applicable to a memory controller that receives memory access requests and synchronizing events generated by a processor (e.g., a central processing unit CPU). Upon receiving a first synchronizing event from the master device, the memory controller presents a first plurality of memory access requests to a memory access request arbiter in parallel for re-ordering. When the first plurality of memory access requests are serviced by the memory access request arbiter, the memory controller continues to receive a second plurality of memory access requests generated by the processor. Significantly, the processor is not stalled when the first plurality of memory access requests are being serviced. After the first plurality of memory access requests have been completed, the memory controller processes the first synchronizing event. Thereafter, the second plurality of memory access requests are presented to the memory access request arbiter in parallel for re-ordering.
In one particular embodiment, an identifier (e.g., a synchronizing request number SRN) is added by the memory controller to each of one of the memory access requests. The identifier indicates a storage location of a preceding synchronizing event. The memory controller monitors that storage location such that, as soon as the preceding synchronizing request is processed, the memory access requests whose identifiers indicating the storage location can be presented to the arbiter. The same identifier is also inserted into another synchronizing event that follows the preceding synchronizing event. The memory controller monitors the storage location indicated by the identifier and all other storage locations containing memory access requests that are associated with the identifier. A synchronizing events, however, will be processed only when all preceding memory access requests are completed and when the preceding synchronizing event is processed. In essence, the memory access requests and the synchronizing events are self-tracking such that proper re-ordering boundaries can be maintained.
Embodiments of the present invention include the above and further include a computer system that includes a processor, a processor bus coupling the processor to a memory controller, a memory bus coupling the memory controller to a memory. The memory controller of the present embodiment includes a memory access request arbiter for re-ordering memory access requests; a buffer that has a plurality of storage locations for storing memory access requests and synchronizing events; and a request screening unit that causes the buffer to present a first plurality of memory access requests to the memory access request arbiter in parallel upon receiving a first synchronizing event without stalling the processor.