This invention is related to communication between computer buses, and more particularly to bus-to-bus bridge devices.
The development of computer bus architectures over the last thirty years has been influential in transitioning computers from being a research tool into becoming a practical, multi-purpose machine. Busses may now be found both within an integrated circuit (IC) processing unit and connecting the processing units to other agents such as external memory and peripheral devices. The physical characteristics of a single bus, however, place a limit on the number of agents (including peripheral devices and processors) which may attach to it. Many modern applications of computer systems therefore rely on multiple-bus architectures having a number of physically separate buses to further expand their functionality.
Physically separate buses are often combined into a single logical bus using a bridge. A bridge may include hardware (digital hardwired circuitry), software (high or low level commands and instructions to be executed by one or more processors), and firmware (software typically stored in different types of read-only memory) or combinations thereof, that monitor and control data traffic between at least two physically separate buses. The bridge interfaces one bus protocol to another to facilitate communication between agents on the different buses.
The bridge that couples two buses is typically configured to be transparent, so that the physically separate buses may be treated by the agents and the system as one bus. To achieve such a result, an address space is shared by agents on both buses. Requests (a read or a write) bearing an address range within the shared address space are generated by an initiator agent on an initiating bus. The bridge recognizes the address range and can forward the request to a target agent on a target bus. The bridge may thus be said to automatically perform the request on the target bus on behalf of the initiator agent.
Different bus and bridge architectures abound in the current state of computer technology. An example of a modern, computer bus is the Peripheral Components Interconnect (PCI) bus. The PCI bus is an industry standard, high performance, low latency system bus, generally defined by the PCI Special Interest Group (SIG) in PCI Local Bus Specification, Revision 2.1, Oct. 21, 1994. The PCI bus will be used throughout this disclosure to illustrate some of the principles behind and operation of the various embodiments of the invention. However, those principles may also be applied to other multiple-bus architectures.
The PCI SIG also maintains a bridge architecture described in PCI-to-PCI Bridge Architecture Specification, Revision 1.0, Apr. 5, 1994. The PCI bridge is also often referred to in this disclosure to illustrate some of the principles behind and operation of the various embodiments of the invention. However, those principles may also be applied to other bridge designs.
Transactions Using the Bridge
Transactions are defined here as complete transfers of data between an initiator and a target, where the initiator and target are on different physical buses coupled by a bridge. When forwarding data from one bus to another, bridges typically implement a number of data queues to hide the delay associated with requesting and obtaining access to the target bus for obtaining or forwarding the data. Each transaction is typically assigned a logical queue which is released when the transaction is completed.
The queue will typically be part of a memory or buffer that implements a First-In-First-Out (FIFO) data structure. The FIFO is a data structure from which items are taken out in the same order they were put in. It is also known as a "shelf", from the analogy with pushing items onto one end of a shelf so that they fall off the other. Typically, the FIFO may be written to and read from simultaneously. A FIFO in a bridge is useful for buffering a stream of data between an initiator and a target which are not synchronized, i.e., not sending and receiving at exactly the same rate.
A transaction as defined herein involves a request from an initiator to read from or write to a given address range which is claimed by a target. If the request is accepted by the bridge, then the transaction begins and an appropriate access is started. An access typically includes an idle phase for setup, an address phase during which address information for the particular request is exchanged, and sometimes a data phase during which data is exchanged.
Alternatively, the request may be denied by the bridge. In that case, the bridge issues a termination known as a retry signal to the initiator. This may occur if the assigned bridge queue is full or has no data to transfer. Sometimes, the new request may be denied if there are no free queues available to be assigned, where all the queues are being used for other pending transactions. If the request is denied, the initiator may repeat the request to complete an ongoing transaction or attempt to start a new one.
Where the request is accepted and a first access is started, the access may be prematurely terminated by either the initiator, the target, or the bridge, for various reasons. If this happens, the request may be repeated or a subsequent request may be issued by the initiator to complete the transaction and transfer all of the requested data. Splitting the transaction so that the desired data is transferred in multiple accesses, however, introduces increased overhead in the form of additional accesses having additional idle and address phases. The increased overhead can reduce throughput, where throughput is the amount of data transferred across the bridge per unit time, averaged over a given period. In contrast, latency is defined as the time needed to provide the initiator or target with the first data block of a multiple-block transaction. These two performance criteria will be used throughout this disclosure to help illustrate some of the advantages of the different embodiments of the invention. It would be desirable to have a technique that permits either an increase in throughput or decrease in latency so that the bridge may be tuned to the particular application.
Write Transactions
The write transaction is typically performed as a posted write transaction in the PCI model. In such transactions, the initiator transfers data into a queue in the bridge after the bridge accepts the initiator's request. The bridge then requests control of the target bus and after receiving control forwards the data from the queue to the target. The transaction, however, is completed on the initiating bus before being completed on the target bus.
Write transactions include the typical memory write, and the memory write and invalidate (MWI). The two write transactions differ in that MWI must be for an integer number of cache memory lines, whereas the plain memory write can be used to write smaller amounts of data.
On average, the buses on either side of the bridge will be kept busy for a much longer time with MWI transactions than with plain memory writes. This may unnecessarily tie up the target bus that has agents which are not configured to respond to MWI requests. For example, the initiator may be a newer generation peripheral device that is plugged into an older multiple-bus computer system having older generation targets, where the initiator supports MWI but its target does not. To improve performance in such systems, the software in the initiator could be modified to not issue the MWI and instead use plain memory writes to perform transactions aimed at targets which don't support MWI. Such a change, however, will need to be implemented on each device and may present a cumbersome task for the system operator as many new devices are added over the lifetime of the system. Therefore, it would be desirable to have a technique for handling MWI requests in a multiple-bus computer system without having to modify the software in each new device that may be added to the system over its lifetime.
Read Transactions
In addition to write transactions, another area of performance optimization in bridges lies in read transactions. Read transactions across a bridge are more involved than write transactions in that a read transaction is typically performed as a delayed transaction rather than as a posted transaction. For example, in the PCI model, the PCI bridge in a delayed transaction latches the information required to complete the initial request, and the initiator is then signaled a retry. The bridge then performs the initial request over the target bus on behalf of the initiator. Any returning data or response from the target is stored in a bridge queue. The initiator must then repeat the original request to retrieve the data from the queue and complete the transaction.
With read transactions in a PCI system, the exact amount of data that the bridge reads over the target bus is not specified, but rather may depend on the particular PCI command type and whether the memory address space to be read from is prefetchable or not. While the initiator knows the exact amount of data it needs to read, it cannot specify this amount under the PCI model.
When the memory space is prefetchable, the bridge in response to a read request reads and stores data from the target up to a fixed and predetermined number of blocks, or until the assigned bridge queue is full. This speculative operation on behalf of the initiator is done in anticipation of any subsequent or repeated read requests from the initiator. Upon arrival of the repeated read request, the read data begins to flow to the initiator from the bridge queue, but can be stopped by the initiator at any time. Any data in the queue which the bridge had read from the target but which is not transferred to the initiator is then discarded.
A drawback of this read prefetching scheme is that it wastes valuable bus time keeping the target bus busy. During any access, the initiating and/or target buses may be occupied or busy while data is transferred into or out of the queue in the bridge. While occupied, a bus normally cannot be used by other agents. If some of the prefetched data in a read access is subsequently discarded, the target bus was kept busy without resulting in any data transfer through the bridge. Therefore, an optimization scheme is desirable to help reduce the time during which the target bus is unnecessarily kept busy in this way.
Another area of bridge performance optimization lies in controlling the rate of data transfer between the initiator and the target. In the PCI multiple-bus architectures, the initiator and target agents on either side of a bridge are allowed to "throttle" the rate of data flow between them. Throttling occurs when either agent requests and obtains wait states from the bridge while data for a transaction is being stored in the bridge. The wait states reduce the rate at which data is accepted into or removed from the bridge.
If, however, the data rates on either side of the bridge differ by too much in one direction for too long, then the bridge queue will become either full (blocking the initiator in a write transaction) or empty (not providing any data to the target), halting the simultaneous flow of data into and out of the bridge. Given the above observations, therefore, it would be desirable to have further techniques that allow the optimization of data flow through the bridge so as to improve throughput or latency (depending on the application of the bridge) and increase the likelihood of simultaneous flow.