1. Field of the Invention
This invention relates to a computer and, more particularly, to a bus interface unit which can stall read and non-postable write cycles issued to a peripheral bus until the peripheral bus becomes available or a cycle occurs to system memory. If a cycle is issued to system memory, then the bus interface unit either defers or retries the prior cycles to the peripheral bus after the cycle to system memory completes.
2. Description of the Related Art
Modem computers are called upon to execute instructions and transfer data at increasingly higher rates. Many computers employ CPUs which operate at clocking rates exceeding several hundred MHz, and further have multiple busses connected between the CPUs and numerous input/output devices. The busses may have dissimilar protocols depending on which devices they link. For example, a CPU local bus connected directly to the CPU preferably transfers data at a faster rate than a peripheral bus connected to slower input/output devices. A mezzanine bus may be used to connect devices arranged between the CPU local bus and the peripheral bus. The peripheral bus can be classified as, for example, an industry standard architecture ("ISA") bus, an enhanced ISA ("EISA") bus or a microchannel bus. The mezzanine bus can be classified as, for example, a peripheral component interconnect ("PCI") bus to which higher speed input/output devices can be connected.
Coupled between the various busses are bus interface units. According to somewhat known terminology, the bus interface unit coupled between the CPU bus and the PCI bus is often termed the "north bridge". Similarly, the bus interface unit between the PCI bus and the peripheral bus is often termed the "south bridge".
The north bridge, henceforth termed a bus interface unit, serves to link specific busses within the hierarchical bus architecture. Preferably, the bus interface unit couples data, address and control signals forwarded between the CPU local bus, the PCI bus and the memory bus. Accordingly, the bus interface unit may include various buffers and/or controllers situated at the interface of each bus linked by the interface unit. In addition, the bus interface unit may receive data from a dedicated graphics bus, and therefore may include an advanced graphics port ("AGP"). As a host device, the bus interface unit may be called upon to support both the PCI portion of the AGP (or graphics-dedicated transfers associated with PCI, henceforth is referred to as a graphics component interconnect, or "GCI"), as well as AGP extensions to the PCI protocol.
There are numerous tasks performed by the bus interface unit. For example, the bus interface unit must orchestrate timing differences between a faster CPU local bus and a slower mezzanine bus, such as a PCI bus. The bus interface unit should also give priority to certain types of transfers. For example, a cycle initiated by the CPU to memory must, in most instances, be completed quickly. If not, the processor-to-memory queue may not be optimally filled and instructions may not be expeditiously executed.
One mechanism in which to account for timing differences involves, for example, stalling cycles within the CPU local bus to allow the peripheral bus to catch up. This, however, penalizes CPU throughput and should be used only sparingly and judiciously. Stalling the CPU bus typically occurs during a particular transaction phase of the CPU bus pipeline. It is noted that modem CPUs utilize an extensive pipeline which can store multiple cycles of multiple transactions upon the CPU local bus. For example, a Pentium.RTM. Pro processor bus includes a decoupled, 12-stage super pipelined implementation. A transaction relating to a single bus request can sequentially pipeline through numerous phases: arbitration, request, error, snoop, response and data transfer.
Stalling the CPU local bus generally involves stalling one or more cycles in the snoop phase. This affords the earlier phases to receive cycles and have those cycles available in the snoop phase. If called upon, those cycles can be released in a timely fashion to the subsequent response and data transfer phases.
FIG. 3 illustrates a timing diagram of exemplary transaction phases of the Pentium.RTM. Pro processor bus. In the example shown, a cycle 8a of a first transaction 8 requires approximately three bus clock cycles to obtain mastership of the CPU local bus. Approximately two clock cycles later, cycle 8a proceeds from the arbitration phase to a request phase 8b. As shown, the cycle 8c begins in the error phase approximately three clocks after the request phase. Cycle 8d occurs approximately four clocks after the request phase or approximately three clocks after the previous transaction snoop cycle, whichever is later. The cumulative number of clock cycles needed to place a transaction within the snoop phase is shown to be approximately ten clock cycles, in the example provided. Of course, as transaction 8 progresses to the snoop phase, a cycle 9d of another transaction 9 can subsequently arrive in the snoop phase as well.
If the first transaction 8 is initiated from the CPU to a peripheral device as its final destination, then it may be necessary to delay the transaction in the snoop phase to allow the peripheral bus to clear and/or data upon the peripheral bus to become available. For example, if a transaction preceding the first transaction 8 is a non-postable write to the peripheral device, then it is necessary that the peripheral device and the peripheral bus become available before data of transaction 8 is presented upon the bus. Alternatively, if transaction 8 is a read transaction, it is necessary that the data to be read from the peripheral device be present on the peripheral bus before the local CPU bus can transfer that data during the data transfer phase. For at least these reasons, cycles within the CPU bus destined for a slower peripheral bus must occasionally be stalled in the snoop phase of the CPU bus until the peripheral bus clears and/or data therein is available.
Stalling the CPU bus at the snoop phase is typically done a fixed number of clock cycles. That is, historical differences between the peripheral bus (and peripheral device) and the CPU bus speed indicates that the peripheral bus or data on the peripheral bus will be made available some time after a transaction is completed on the CPU bus. The next transaction to the peripheral is then stalled a fixed amount of time mandated by the historically derived differences in the bus speeds. Thus, regardless of destinations for the subsequent transactions, the current transactions are stalled a fixed number of clock cycles to allow the peripheral bus to clear. This, unfortunately, will penalize throughput of all subsequent cycles (including memory cycles).
In an attempt to immediately service transactions to local memory (and i.e., system memory of substantially contiguous semiconductor memory space) many conventional techniques allow memory cycles to be completed through the CPU bus ahead of cycles to peripheral devices. This involves a technique known as cycle "deferral" of preceding, slower peripheral-destined cycles, and allowing faster, memory-destined cycles to be drawn from the in-order queue of the pipeline.
Referring to FIG. 3 and the two-transaction example shown, deferral of first transaction 8 may occurs at the snoop phase by tagging transaction 8 and allowing the second transaction 9 to proceed as cycles 9e and 9f within respective response and data transfer phases. In this manner, priority is given to a transaction which must be quickly serviced over that of another transaction which need not be transferred as quickly, possibly due to the slower nature of its destination device. Accordingly, the example shown in FIG. 3 illustrates a first transaction 8 destined for a slower peripheral device coupled to either a mezzanine bus or a peripheral bus, whereas the second transaction 9 is destined for semiconductor memory.
An unfortunate result of deferral technique is that the transaction being deferred at the snoop phase must be re-initiated at a later time beginning at the arbitration phase. As shown in FIG. 3, the ten cycles needed to place the transaction in the snoop phase must be re-initiated for a penalty of approximately ten CPU bus cycles. Many conventional pipeline schemes immediately defer any cycles destined for a peripheral device. The destination is detected at the snoop phase by the snoop agent. When the snoop agent is a peripheral device, that agent will return a signal causing the snoop phase to initiate a DEFER# signal. DEFER# removes the transaction from the in-order queue by generating the appropriate response. All peripheral-destined transactions will then be automatically deferred in favor of memory-destined transactions. Thus, there may be multiple transactions being deferred unless the snoop agent indicates the peripheral bus and/or device is available. In this manner, each transaction must undergo a ten clock cycle penalty if the peripheral bus, peripheral device, or data is not available when snoop occurs. When the peripheral bus, device or data later becomes available, then the peripheral-destined transaction will proceed through the remaining phases, beginning with the arbitration phase.
The aforementioned, conventional algorithm pays a rather large penalty for each deferral which may not be necessary. It would be more beneficial to simply stall the transaction at the snoop phase, but to do so only under certain conditions. Likewise, deferral of peripheral-destined transactions must be minimized and made dependent on the timing and type of transactions subsequently arriving in the pipeline.