1. Field of the Invention
This invention relates generally to communication between devices on different buses of a computer system, and, more particularly, to a method and apparatus for promoting memory read commands and advantageously prefetch data to reduce bus latency.
2. Description of the Related Art
Computer systems of the PC type typically employ an expansion bus to handle various data transfers and transactions related to I/O and disk access. The expansion bus is separate from the system bus or from the bus to which the processor is connected, but is coupled to the system bus by a bridge circuit.
A variety of expansion bus architectures have been used in the art, including the ISA (Industry Standard Architecture) expansion bus, an 8-Mhz, 16-bit device and the EISA (Extension to ISA) bus, a 32-bit bus clocked at 8-Mhz. As performance requirements increased, with faster processors and memory, and increased video bandwidth needs, high performance bus standard were developed. These standards included the Micro Channel architecture, a 10-Mhz, 32-bit bus; an enhanced Micro Channel, using a 64-bit data width and 64-bit data streaming; and the VESA (Video Electronics Standards Association) bus, a 33 MHz, 32-bit local bus specifically adapted for a 486 processor.
More recently, the PCI (Peripheral Component Interconnect) bus standard was proposed by Intel Corporation as a longer-tern expansion bus standard specifically addressing burst transfers. The original PCI bus standard has been revised several times, with the current standard being Revision 2.1, available from the PCI Special Interest Group, located in Portland, Oregon. The PCI Specification, Rev. 2.1, is incorporated herein by reference in its entirety. The PCI bus provides for 32-bit or 64-bit transfers at 33 or 66 MHz. It can be populated with adapters requiring fast access to each other and/or with system memory, and that can be accessed by the host processor at speeds approaching that of the processor""s native bus speed. A 64-bit, 66-MHz PCI bus has a theoretical maximum transfer rate of 528 MByte/sec. All read and write transfers over the bus may be burst transfers. The length of the burst may be negotiated between initiator and target devices, and may be any length.
A CPU operates at a much faster clock rate and data access rate than most of the resources it accesses via a bus. In earlier processors, such as those commonly available when the ISA bus and EISA bus were designed, this delay in reading data from a resource on the bus was handled by inserting wait states. When a processor requested data that was not immediately available due to a slow memory or disk access, the processor merely marked time using wait states, doing no useful work, until the data finally became available. To make use of this delay time, a processor such as the Pentium Pro (P6), offered by Intel Corporation, provides a pipelined bus that allows multiple transactions to be pending on the bus at one time, rather than requiring one transaction to be finished before starting another. Also, the P6 bus allows split transactions, i.e., a request for data may be separated from the delivery of the data by other transactions on the bus. The P6 processor uses a technique referred to as xe2x80x9cdeferred transactionxe2x80x9d to accomplish the split on the bus. In a deferred transaction, a processor sends out a read request, for example, and the target sends back a xe2x80x9cdeferxe2x80x9d response, meaning that the target will send the data onto the bus, on its own initiative, when the data becomes available.
The PCI bus specification as set forth above does not provide for split transactions. There is no mechanism for issuing a xe2x80x9cdeferred transactionxe2x80x9d signal, nor for generating the deferred data initiative. Accordingly, while a P6 processor can communicate with resources such as main memory that are on the processor bus itself using deferred transactions, this technique is not used when communicating with disk drives, network resources, compatibility devices, etc., on an expansion bus.
The PCI bus specification, however, provides a protocol for issuing delayed transactions. Delayed transactions use a retry protocol to implement efficient processing of the transactions. If an initiator initiates a request to a target and the target cannot provide the data quickly enough, a retry command is issued. The retry command directs the initiator to retry or xe2x80x9cask againxe2x80x9d for the data at a later time. In delayed transaction protocol, the target does not simply sit idly by, awaiting the renewed request. Instead, the target initially records certain information, such as the address and command type associated with the initiator""s request, and begins to assemble the requested information in anticipation of a retry request from the initiator. When the request is retried, the information can be quickly provided without unnecessarily tying up the system""s buses.
Differentiated commands are used in accordance with the PCI specification to indicate, or at least hint at, the amount of data required by the initiator. A memory read (MR) command does not provide any immediate indication as to the length of the intended read. The read is terminated based on logic signals driven on the bus by the initiator. A memory read line (MRL) command, on the other hand, indicates that the initiator intends to read at least one cache line (e.g., 32 bytes) of data. A memory read multiple command (MRM) indicates that the initiator is likely to read more than one cache line of data. Based on the command received, the bridge prefetches data and stores it in a buffer in anticipation of the retried transaction. The amount of data prefetched depends on the amount the initiator is likely to require. Efficiency is highest when the amount of prefetched data most closely matches the amount of data required.
Prefetching in response to MRL and MRM commands is relatively uncomplicated, because, by the very nature of the command, the bridge knows to prefetch at least one, and likely more than one, cache line. The amount of data required by an initiator of an MR command, on the other hand, is not readily apparent. Initiators may issue MR commands even if they know they will require multiple data phases. For example, the PCI specification recommends, but does not require, that initiators use an MRL or an MRM command only if the starting address lies on a cache line boundary. Accordingly, a device following this recommendation would issue one or more MR commands until a cache line boundary is encountered, and would then issue the appropriate MRL or MRM command. Also, some devices, due to their vintage or their simplicity, are not equipped to issue MRL or MRM commands, and use MR commands exclusively.
To illustrate the difficulties of anticipating the amount of data required by the initiator of an MR command, FIGS. 1A through 1D provide timing diagrams of exemplary MR transactions on a PCI bus. For clarity, only those PCI control signals useful in illustrating the examples are shown. The PCI bus uses shared address/data (AD) lines and shared command/byte enable (C/BE#) lines. In accordance with the PCI specification, a turnaround cycle is required on all signals that may be driven by more than one agent. In the case of the AD lines, the initiator drives the address and the target drives the data. The turnaround cycle is used to avoid contention when one agent stops driving a signal and another agent begins driving the signal. A turnaround cycle is indicated on the timing diagrams as two arrows pointing at each others"" tail.
FIG. 1A illustrates an MR command in which the initiator requires multiple data phases to complete the transaction. In this illustration, the target and initiator reside on the same PCI bus, and the target is ready to supply the data when requested. The initiator asserts a FRAME# signal before the rising edge of a first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively. During a third cycle, CLK3, the initiator asserts the IRDY# signal to indicate that it is ready to receive data. The target also asserts the TRDY# signal at CLK3 (i.e., after the turnaround cycle) to signal that valid data is present on the AD lines. In accordance with the PCI specification, the initiator must deassert FRAME# before the last data phase. Because the FRAME# signal remains asserted at CLK3, the target knows that more data is required. Data transfer continues between the initiator and target during cycles CLK4 and CLK5. The initiator deasserts the FRAME# signal before CLK5 to indicate that Data3 is the last data phase. The initiator continues to assert the IRDY# signal until after the last data phase has been completed.
FIG. 1B illustrates an MR command in which the initiator requires only one data phase to complete the transaction. Again, the initiator asserts the FRAME# signal before the rising edge of the first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively. During the third cycle, CLK3, the initiator asserts the IRDY# signal to indicate that it is ready to receive data. The target asserts the TRDY# signal at CLK3 (i.e., after the turnaround cycle) to signal that valid data is present on the AD lines. Because the initiator must deassert frame before the last data phase, the FRAME# signal is deasserted before CLK3. The target then knows that no more data is required. The initiator continues to assert the IRDY# signal during the transfer of the data at CLK3, and deasserts it thereafter.
From the examples of FIGS. 1A and 1B, it is clear that the determination of the amount of data required by the initiator may not be determined until well into the transaction. FIGS. 1A and 1B illustrated MR transaction between devices on the same PCI bus. FIGS. 1C and 1D illustrates an MR transaction where the target resides on a different PCI bus than the initiator, and is subordinate to a bridge device.
As shown in FIG. 1C, the initiator asserts the FRAME# signal before the rising edge of the first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively. The bridge claims the transaction, and because no data is readily available forces a retry by asserting the STOP# signal during CLK2. In response to the STOP# signal, the initiator deasserts the FRAME# signal before CLK3. The bridge then deasserts STOP# at CLK4. The bridge, not knowing how much data the initiator requires, conservatively assumes the transaction is a single data phase transaction and retrieves the data.
At some later time, as shown in FIG. 1D, the initiator retries the request. Again, the initiator asserts the FRAME# signal before the rising edge of the first clock cycle (CLK1) to indicate that valid address and command bits are present on the AD lines and the C/BE# lines, respectively. The bridge, now in possession of the data, allows the transaction to proceed. During the third cycle, CLK3, the initiator asserts the IRDY# signal to indicate that it is ready to receive data. The bridge asserts the TRDY# signal at CLK3 to signal that valid data is present on the AD lines. The bridge also asserts the STOP# signal at CLK3 to indicate it cannot provide any further data. Even though the initiator desired more than one data phase to complete the transaction, as indicated by the FRAME# signal being asserted during the transfer of Data1, the transaction is terminated.
The initiator is then forced to issue a new transaction, in accordance with FIG. 1C for the next data phase. The cycle of FIGS. 1C and 1D repeats until the initiator has received its requested data. The situation of FIGS. 1C and 1D illustrate an inefficiency introduced by the use of an MR command. It may take many such exchanges to complete the data transfer, thus increasing the number of tenancies (ie., exchanges between an initiator and a target) on the bus. Also, the initiator, bridge, and target must compete for bus time with other devices on their respective buses, thus increasing the total number of cycles required to complete the transaction beyond those required just to complete the evolutions of FIGS. 1C and 1D.
Techniques have been developed in the art to attempt to increase the efficiency of MR transactions traversing bridges. One such technique involves storing an MR promotion bit for each of the devices subordinate to a bridge in the private configuration space of the bridge. If the bit is asserted, MR commands are automatically promoted, and multiple data phases of data are prefetched. The decision on whether to set the promotion bit depends on knowledge of the device being accessed. Certain devices have undesirable read xe2x80x9cside effects.xe2x80x9d For example, an address might refer to a first-in-first-out (FIFO) register. A read to a FIFO increments the pointer of the FIFO to the next slot. If the prefetching conducted in response to the assertion of the promotion bit hits the address of the FIFO, it would increment, and a subsequent read targeting the FIFO would retrieve the wrong data, possible causing undesirable operation or a deadlock condition. Memory regions with such undesirable side effects are referred to as non-speculative regions, and memory regions where prefetching is allowable is referred to as speculative memory regions.
The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.
One aspect of the present invention is seen in a device for providing data. The device includes a data source, a bus interface, a data buffer, and control logic. The bus interface is coupled to a plurality of control lines of a bus and adapted to receive a read request targeting the data source. The control logic is adapted to determine if the read request requires multiple data phases to complete based on the control lines, and to retrieve at least two data phases of data from the data source and store them in the data buffer in response to the read request requiring multiple data phases to complete.
Another aspect of the present invention is seen in a method for retrieving data. The method includes receiving a read request on a bus. The bus includes a plurality of control lines. It is determined if the read request requires multiple data phases to complete based on the control lines. At least two data phases of data are retrieved from a data source in response to the read request requiring multiple data phases to complete. The at least two data phases of data are stored in a data buffer.