The invention relates generally to computer systems, and more particularly to techniques for improving peripheral device to main memory throughput using a delayed read pipeline.
Personal computer (PC) systems generally employ an expansion bus to handle various data transactions related to input-output devices such as magnetic and/or optical mass storage units and network interface controllers. Typically, expansion buses are coupled to the system bus (to which one or more processors/central processing units are connected) by a bridge circuit.
In the past, many PCs employed an expansion bus operated in conformance with the Industry Standard Architecture (ISA) standard. The ISA standard defines a 16-bit bus having a maximum transfer rate of 8.33 MBytes/sec. The subsequently defined extended ISA (EISA) bus uses a 32-bit data path to provide a peak transfer rate of 33 MBytes/secxe2x80x94four times that of the ISA bus. As the operating speed of system components continues to increase, however, ISA, EISA and their descendant bus standards (e.g., the Micro Channel and VESA bus standards) have been unable to provide the necessary operational bandwidth.
One solution to the low bandwidth capacity of earlier expansion bus architectures is embodied in the Peripheral Component Interconnect (PCI) standard. The PCI standard defines both 32-bit and 64-bit data transfer protocols operating at either 33 MHz or 64 MHz. A key feature of the PCI standard is that it supports burst operations on a transaction by transaction basis, where the length of each burst may be negotiated between the device initiating the transfer (a master) and the device receiving the transfer request (a target). As a result, expansion buses operated in conformance with the PCI specification support transfer rates of 132 MBytes/sec for a 32-bit bus operating at 33 MHz, 264 MBytes/sec for a 32-bit bus operating at 66 MHz or a 64-bit bus operating at 33 MHz, and 528 MBytes/sec for a 64-bit bus operating at 66 MHz. (The current version of the PCI specification (rev. 2.2) is available from the PCI Special Interest Group, 2575 NE. Kathryn Street #17, Hillsboro, Oreg. 97124.)
While expansion bus operating speeds have continued to increase, many target devices are still not able to respond to master initiated data requests in a timely fashion (i.e., within the bus required latency period). In computer systems employing ISA and EISA expansion buses, for example, the delay in reading data from a slow target device was handled by wait states. That is, when a target (e.g., system memory) could not immediately provide the data requested by a master (e.g., a processor), the target simply marked time using wait states until the data became available. The use of wait states in this manner prevents another device from accessing the bus. Thus, expansion bus bandwidth was effectively limited by the slowest responding device on the bus. To avoid the use of wait states, the PCI standard allows the use of delayed transactions. In a delayed transaction data requests are temporally separated from the delivery of the requested data by other transactions. Wait states are not usedxe2x80x94while the originating master waits for the target device to provide the requested information, other bus masters are allowed to use the bus. In accordance with the PCI specification, a delayed transaction progresses to completion in three phases: (1) request by master; (2) completion of the request by the target; and (3) completion of the transaction by the master.
During phase one, the master generates a transaction on the bus while the target decodes the address, latches the information required to complete the access and terminates the request with a Retry. (xe2x80x9cRetryxe2x80x9d refers to the condition where a target device issues a transaction termination request before any data is transferred. This condition may occur, for example, because the target device is unable to meet the bus latency requirement, is currently locked by another master, or there is a conflict for a internal resource. Target devices indicate Retry by asserting STOP# and not asserting TRDY# on the initial data phase of a transaction.) The latched request information is referred to as a Delayed Request. The master initiating the Retried transaction must reissue its request until the request completes.
During phase two, the target independently completes the request using the latched information from the Delayed Request. If the Delayed Request corresponds to a read operation, the target obtains the requested read data and completion status. If the Delayed Request corresponds to a write transaction, the target delivers the write data and obtains the completion status. The result of completing the Delayed Request produces a Delayed Completion (consisting of the latched information of the Delay Request, the completion status and, possibly, data). The target stores the Delayed Completion until the master repeats its initial request.
During phase three, the master successfully rearbitrates for the bus and reissues the original request. The target decodes the request and gives the master the completion status (and data if the transaction is a read transaction). At this point, the Delayed Completion is retired and the transaction has completed.
In accordance with conventional PCI bus to system memory control devices (e.g., the 440GX chip from Intel Corporation), only a single PCI device to system memory delayed read transaction may be accepted for processing at a time. Thus, each delayed read by a PCI device to system memory incurs the full memory access latency. Thus, it would be beneficial to provide a mechanism to reduce the read latency (thereby improving bandwidth utilization of the PCI bus) associated with PCI device to system memory read operations.
In one embodiment the invention provides a method to operate a computer system bridge circuit. The method includes enqueueing multiple delayed read requests to system memory, wherein each delayed read request is associated with a different expansion bus devices. The method may also include forwarding a second enqueued read request to the system memory before receiving a response to a first forwarded enqueued read request. The method may further include arbitrating to an expansion bus device (having an enqueued delayed read request) only after read data is received from the system memory in response to a forwarded read request.
In another embodiment, the invention provides a computer system including a system memory, an expansion bus, a plurality of devices coupled to the expansion bus, and a bridge circuit having a queue and a control circuit. The control circuit is adapted to enqueue a plurality of delayed read requests to the system memory from the expansion bus devices (each enqueued delayed read request being associated with a different expansion bus device). The control circuit may be further adapted to transmit a plurality of read requests to the system memory (each read request corresponding to an enqueued delayed read request), and to receive read data from the system memory in response to the transmitted read requests. The bridge circuit may further include an arbiter circuit adapted to arbitrate to that expansion bus device associated with that delayed read request for which the read data was received only after the read data is received from system memory.