A typical conventional computer system includes a central processing unit ("CPU"), a cache subsystem, and a bus interface unit ("BIU"). During operation, a read or write request from the CPU is first sent to the cache. If the cache contains the target data (i.e., on a cache hit), the cache directly services the request. Conversely, if the cache does not contain the target data (i.e., on a cache miss) or if the request is directed to an uncacheable memory address or an input/output ("I/O") address, the cache passes the request on to the BIU. When the BIU receives a read or write request, the request is submitted to the external memory or I/O systems using a predefined bus protocol, and any results are returned back to the cache and CPU (via the cache). Additionally, the cache services snoop requests from external agents such as other processors in order to perform cache-coherency operations.
One bus protocol used in modem computer systems is the Pentium.RTM. II bus protocol as defined in Volume 1 of the Pentium Pro Family Developer's Manual, which is published by Intel Corporation (Santa Clara, Calif.) and is herein incorporated by reference. In accordance with this protocol, the BIU communicates with the memory and I/O systems using several different read and write request transactions including: bus read line ("BRL"), bus read and invalidate line ("BRIL"), bus invalidate line ("BIL"), bus write line ("BWL"), bus read partial ("BRP"), bus write partial ("BWP"), I/O read ("IOR"), I/O write ("IOW"), and implicit write-back ("IWB"). Further, the BIU manages interrupted transactions that include deferred and retried transactions. A brief description of each of these transactions will now be given.
A bus read line transaction is requested when a new line is to be loaded into the cache. When a CPU read from a cacheable address misses the cache, the cache issues a BRL transaction to the BIU. In response, the BIU makes a read request to main memory for the number of bytes required to fill a cache line (e.g., 32 bytes). Because the CPU can process read transactions speculatively and out-of-order, BRLs do not have any ordering requirements either with respect to each other or with respect to other types of bus transactions.
A bus read and invalidate line transaction is initiated when a CPU write transaction to a cacheable address misses the cache. Like a BRL, a BRIL causes the BIU to read a line from external memory. Additionally, the addressed line is invalidated in all other caches (for external agents in the system) in which the line resides. Although in conventional systems memory writes must generally be kept in order, a BRIL does not directly influence the ordering of the CPU write transaction from which it was generated. Thus, BRILs do not have any ordering requirements either with respect to each other or with respect to other types of bus transactions. Similarly, a bus invalidate line transaction is initiated when a CPU write to a cacheable address hits a shared line in the cache. Such a shared line must be changed to the exclusive state before it can be modified by the CPU. The BIL transaction is used to invalidate the addressed line in all other caches in which the line resides, without reading any data from the external memory. BILs also do not have any ordering requirements either with respect to each other or with respect to other types of bus transactions.
A bus write line transaction is generated when the cache writes a displaced cache line back to memory so that a new line can be loaded into the cache. A BWL is also generated when multiple CPU write transactions to uncacheable memory addresses are accumulated (i.e., write-combined) in the BIU. In a BWL, an entire line (e.g., 32 bytes) is written to the external memory. Like BRLs, BWLs do not have any ordering requirements either with respect to each other or with respect to other types of bus transactions.
The bus read partial and I/O read transactions are generated when the CPU issues a read transaction that is directed to an uncacheable memory address or an I/O address, respectively. When a BRP or an IOR is submitted to the bus by the BIU, one to eight bytes of data are read from the designated address. Similarly, the bus write partial I/O write transactions are generated when the CPU issues a write transaction to an uncacheable memory address or an I/O address. The BWP and IOW transactions cause one to eight bytes of data to be written to the designated address. While the BIU must issue BRPs, BWPs, IORs, and IOWs to the bus in the order in which they are received from the CPU, these types of transactions do not have any ordering requirements with respect to BRLs, BRILs, BILs, and BWLs.
A snoop implicit write-back transaction is generated when a transaction from an external agent "snoops" the cache and requires that modified lines matching the designated address be written back to main memory. While an IWB operates like a BWL, the Pentium.RTM. II bus protocol requires that the line of data for the IWB transaction be delivered as a result of the snooping transaction, rather than by newly generating an independent write transaction (thus, the "implicit" designation). That is, an IWB transaction is sent to the BIU by the cache as a result of an external snooping transaction that requests write-back of a modified cache line, instead of by a request originating from the CPU.
A "deferred" transaction is a transaction that is issued by the BIU and then suspended before completion. When a transaction will take a long time and thus block other transactions, a deferral may be ordered by the target of the transaction (e.g., external memory) in order to free the bus for other transactions. Then, when the target is ready to service the transaction that was deferred, the target issues a "Deferred Reply" in order to continue with the execution of the original transaction. Under the Pentium.RTM. II bus protocol, BRLs, BRILs, BILs, BRPs, and BWPs may be deferred by their targets. A "retried" transaction is a transaction that is aborted by its target (e.g., external memory) before the transaction is completed to free the bus for other transactions. The aborted transaction is later reissued to the bus (i.e., retried) by the CPU. All of the transactions types may be retried.
When the BIU receives a read or write request, the transaction is buffered in the BIU. More specifically, the BIU consolidates and orders the received transactions, and then issues the transactions on the bus so as to increase the efficiency (i.e., throughput) of the bus. For example, in the BIU of a typical computer system, all cacheable, line-oriented transactions (BRLs, BRILs, BILs, and BWLs) are placed in a single cacheable request queue ("CRQ") and then issued to the bus at the maximum rate allowed (as defined by the bus protocol). While issuing such transactions at the maximum rate may result in reordering on the bus via the "retry" and "defer" mechanisms, this does not present a problem because the cacheable, line-oriented transaction types are not order dependent (i.e., they can be sent to the bus and their results returned to the CPU in any order).
Further, all uncacheable, byte-oriented transactions (BRPs, BWPs, IORs, and IOWs) are placed in a second buffer, which is known as the uncacheable request queue ("UCRQ"). Because the uncacheable transaction types must be maintained in the order received with respect to each other, these types of transactions are buffered in the separate UCRQ and then issued to the bus at a lower rate than the cacheable transactions to prevent reordering. However, because they do not have to be ordered with respect to cacheable transactions, the uncacheable transactions can be interspersed with cacheable transactions to increase the throughput of the bus. A third buffer, which is known as the snoop status queue ("SSQ"), is used to buffer all implicit write-backs transactions. Because snooping transactions that generate IWBs cannot be reordered once issued on the bus, the buffered IWBs are submitted to the bus in the order in which the IWB-generating transactions were received by the BIU.
After a transaction is issued to the bus, the transaction is removed from the CRQ or UCRQ and stored in an uncompleted transaction buffer, which is known as the in-order queue ("IOQ"). The contents of the IOQ are used to track active transactions through their various phases. Whenever an active transaction is deferred by its target, the transaction is considered temporarily completed. Thus, the deferred transaction is removed from the IOQ and placed in a deferred transaction buffer, which is random-access so that when a Deferred Reply is received the corresponding transaction can be reactivated. Similarly, whenever an active transaction is retried (i.e., aborted) by its target, the transaction is removed from the IOQ and placed in a retry buffer, which is known as the retried request queue (RTRQ). The retried transactions in the RTRQ are later reissued to the bus.
When a transaction issued by the BIU is completed, it is removed from the IOQ. Additionally, when certain transactions are completed (e.g., BRLs, BRILs, BILs, BRPs, and BWPs), the BIU must notify the cache (and CPU) that the requested data is available and/or that the next order-dependent operation can be scheduled. Therefore, completed transactions of these types are both removed from the IOQ and placed in another buffer, which is known as the completion status queue (CSQ), on removal from the IOQ. The completed transactions are consecutively submitted to the cache (and CPU) in the order of their completion.
While the conventional bus interface unit described above can manage data transfer between the processor's cache and the external systems, several drawbacks are presented. First, each buffer in the conventional BIU holds the address, data, and control information for every stored transaction. Therefore, if completely separate buffers (i.e., CRQ, UCRQ, SSQ, etc.) are used for each of the transaction classes, each buffer must necessarily be limited to a small number of entries to keep the chip area reasonable. Such low capacity buffers degrade the performance of the system under heavy loading conditions. Additionally, the study of typical program behavior has indicated that it is rare for many transactions of different classes to be active at the same time. In other words, transactions falling into a single transaction class usually predominate at a given point in program execution. Thus, if the number of entries in each of the buffers is increased to improve the overall performance of the system, then the high capacity of the buffers would be heavily underutilized most of the time.
Further, when a transaction of a deferrable transaction type is issued, the BIU cannot know whether or not it will be deferred. Thus, such a transaction cannot be issued until space for it has been allocated in the deferred transaction buffer. When a deferrable transaction is held up due to a full deferred transaction buffer, system performance is degraded. Additionally, the BIU of a modern computer system must be able to concurrently process many different types of transactions, and each type of transaction has its own special control requirements. Therefore, extremely complex circuitry is required to control data flow in a BIU having separate buffers that each contain the address, data, and control information for a large number of transactions.