This invention relates to prefetching data for peripheral component interconnect devices.
A common computer task is the fetching of data by a data-consuming device (such as a peripheral card) from a place where the data is stored (such as a memory). Typically the consuming device is not connected directly to the memory, but rather is connected indirectly to the memory through a bridge, a bus such as a peripheral component interconnect (PCI) bus, and a memory controller.
In a simple case, when a consuming device needs data that is stored at a location in a region of the memory, the consuming device requests the data from the bridge, the bridge fetches the data through the bus and the memory controller, and the data is returned through the bus and the bridge to the consuming device. A delay (called latency) thus occurs between the time when the request is made and the time when the data arrives back at the consuming device.
Often, a data-consuming device will make a series of requests for data from successive locations in a single region of memory. The cumulative latency associated with the successive requests imposes a significant performance loss on the computer system.
In a common technique for reducing the latency loss, when a consuming device asks for data, the bridge fetches not only the requested data but also other data that is stored in the same memory region, based on the speculation that the consuming device may ask for the additional data in later requests. The fetching of data that has not yet been requested is called prefetching. If the consuming device requests the additional, prefetched data, the request can be served immediately from the bridge, eliminating much of the latency that would otherwise occur if requests had to be made to memory.
Prefetching works well if just the right amount of data is prefetched. Prefetching more data than the consuming device will actually use (called overshoot) wastes communication bandwidth because the prefetched data will be thrown away, and can, in fact, increase latency due to increased contention for memory.
On the other hand, if too little data is prefetched (called undershoot), the bridge will not be able to provide all the data the consuming device requests and thus the consuming device must incur the latency to access memory. When the bridge does not have the data requested by the consuming device, the bridge disconnects the PCI transaction and the consuming device must later retry the PCI transaction. This disconnect-retry cycle may repeat many times before the bridge gets the requested data from memory. Thus the consuming device polls the bridge by repeatedly retrying until the bridge has the necessary data. Because of the delay between the bridge receiving the data from memory and the consuming device retrying and finding the data, each disconnect adds latency due to polling overhead in addition to the latency for the bridge to acquire the data. Thus, it is important to minimize the number of disconnects for good performance.
Unfortunately the bridge does not know in advance how much data the consuming device will be requesting. Therefore, it would be useful to provide a prefetching algorithm that, on one hand, minimizes the number of disconnects triggered by lack of data in the prefetching bridge and, on the other hand, minimizes overshoot that prefetches more data than is actually used.
The two goals conflict, however, in that minimizing disconnects is achieved by aggressively prefetching plenty of data so that the consuming device never runs out, while minimizing overshoot is achieved by prefetching less data (zero data in the extreme case, which assures overshoot will never happen).
The algorithm of the invention balances the two conflicting requirements.