A typical conventional computer system includes a central processing unit ("CPU"), a cache subsystem, and a bus interface unit ("BIU"). During operation, a read or write request from the CPU is first sent to the cache. If the cache contains the target data (i.e., on a cache hit), the cache directly services the request. Conversely, if the cache does not contain the target data (i.e., on a cache miss) or if the request is directed to an uncacheable memory address or an input/output ("I/O") address, the cache passes the request on to the BIU. When the BIU receives a read or write request, the request is submitted to the external memory or I/O systems using a predefined bus protocol, and any results are returned back to the cache and CPU (via the cache). Additionally, the cache services snoop requests from external agents such as other processors in order to perform cache-coherency operations.
One bus protocol used in modern computer systems is the Pentium.RTM. II bus protocol as defined in Volume 1 of the Pentium Pro Family Developer's Manual, which is published by Intel Corporation (Santa Clara, Calif.) and is herein incorporated by reference. In accordance with this protocol, the BIU communicates with the memory and I/O systems using several different read and write request transaction formats including: bus read line ("BRL"), bus read and invalidate line ("BRIL"), bus invalidate line ("BIL"), bus write line ("BWL"), bus read partial ("BRP"), bus write partial ("BWP"), I/O read ("IOR"), and I/O write ("IOW"). A brief description of each of these transactions will now be given.
A bus read line transaction is requested when a new line is to be loaded into the cache. When a CPU read from a cacheable address misses the cache, the cache issues a BRL transaction to the BIU. In response, the BIU makes a read request to main memory for the number of bytes required to fill a cache line (e.g., 32 bytes). Because the CPU can process read transactions speculatively and out-of-order, BRLs do not have any ordering requirements either with respect to each other or with respect to other types of bus transactions.
A bus read and invalidate line transaction is initiated when a CPU write transaction to a cacheable address misses the cache. Like a BRL, a BRIL causes the BIU to read a line from external memory. Additionally, the addressed line is invalidated in all other caches (for external agents in the system) in which the line resides. Although in conventional systems memory writes must generally be kept in order, a BRIL does not directly influence the ordering of the CPU write transaction from which it was generated. Thus, BRILs do not have any ordering requirements either with respect to each other or with respect to other types of bus transactions. Similarly, a bus invalidate line transaction is initiated when a CPU write to a cacheable address hits a shared line in the cache. Such a shared line must be changed to the exclusive state before it can be modified by the CPU. The BIL transaction is used to invalidate the addressed line in all other caches in which the line resides, without reading any data from the external memory. BILs also do not have any ordering requirements either with respect to each other or with respect to other types of bus transactions.
A bus write line transaction is generated when the cache writes a displaced cache line back to memory so that a new line can be loaded into the cache. A BWL is also generated when multiple CPU write transactions to uncacheable memory addresses are accumulated (i.e., write-combined) in the BIU. In a BWL, an entire line (e.g., 32 bytes) is written to the external memory. Like BRLs, BWLs do not have any ordering requirements either with respect to each other or with respect to other types of bus transactions.
The bus read partial and I/O read transactions are generated when the CPU issues a read transaction that is directed to an uncacheable memory address or an I/O address, respectively. When a BRP or an IOR is submitted to the bus by the BIU, one to eight bytes of data are read from the designated address. Similarly, the bus write partial I/O write transactions are generated when the CPU issues a write transaction to an uncacheable memory address or an I/O address. The BWP and IOW transactions cause one to eight bytes of data to be written to the designated address. While the BIU must issue BRPs, BWPs, IORs, and IOWs to the bus in the order in which they are received from the CPU, these types of transactions do not have any ordering requirements with respect to BRLs, BRILs, BILs, and BWLs.
When the BIU receives a read or write request, the transaction is buffered in the BIU. More specifically, the BIU consolidates and orders the received transactions, and then issues the transactions on the bus so as to increase the efficiency (i.e., throughput) of the bus. The bus requests received by the BIU can be divided into two fundamental transaction classes: in-order transactions and speculative write-combine ("SWC") transactions. The BIU stores the in-order transactions (BRLs, BRILs, BILs, BWLs, IORs, IOWs, BRPs, and non-combinable BWPs) in one or more first-in, first-out ("FIFO") queues so that the transactions can be issued to the system bus in the order received. For example, in a typical BIU, a first FIFO queue is used to store cacheable, line-oriented transactions (e.g., BRLs and BWLs), and a second FIFO queue is used to store uncacheable, byte-oriented transactions (e.g., IORs and IOWs).
The SWC transaction requests received by the BIU are stored in a "combining" buffer. In more detail, each SWC request is received by the BIU as a request for writing a relatively small amount of data to memory (e.g., one to eight consecutive bytes for a BWP). If the data bytes to be written by multiple SWC requests fall within the address boundaries of the same data line, the BIU can concatenate the data and issue a single write line request to the system bus. For this purpose, the BIU holds SWC requests in a special combining buffer while awaiting additional SWC requests that can be combined into the held transactions. Each "transaction" stored in a combining buffer is transferred to one of the in-order FIFO queues for issuance to the system bus only after the occurrence of a predefined triggering event (e.g., the combining buffer entry becomes full).
While combining SWC requests increases the efficiency of the bus, the conventional BIU uses completely independent buffers for in-order and SWC transactions. Because each buffer holds all of the address, data, and control information for stored transactions, the combining buffer must be limited to a small number of entries to keep the chip area reasonable. However, when a new SWC request falls outside the address ranges of current entries in a full combining buffer, an incomplete entry must be issued by the BIU to make room for the new request. Thus, the addition of a relatively small buffer that does not significantly increase cost can only produce a limited performance increase. On the other hand, if all incoming write requests are simply placed in an in-order FIFO buffer, each SWC request can only be combined with others that arrive before the original request passes through the FIFO buffer. In this case, an additional buffer is not needed, but the performance increase is limited by the relatively short window of opportunity for write combining multiple requests.