This is a continuation-in-part of a co-pending United States patent application entitled xe2x80x9cMethod and Apparatus for Prefetching Data from System Memory to a Central Processing Unitxe2x80x9d (Ser. No. 08/287,704) filed on 08/09/94, now abandoned, which is a continuation of a United States patent application entitled xe2x80x9cMethod and Sea Apparatus for Prefetching Data from System Memoryxe2x80x9d (Ser. No. 07/900,142) filed on 06/07/92, now abandoned,
1. Field of the Invention
The present invention relates to a method and system for reading data from a memory device through a prefetching technique.
2. Description of Related Art
It is commonly known that computer architectures include a microprocessor that reads data from and writes data to system memory which usually includes dynamic random access memory (xe2x80x9cDRAMxe2x80x9d). DRAM is used in system memory because it provides an inexpensive means of obtaining a large memory space. Typically, a computer system may have a number of DRAM chips, each having a plurality of addressable memory locations.
Many microprocessors read data from system memory in multiple byte blocks. Accessing multiple bytes of data from memory is usually slower than the speed of the processor, causing the processor to wait for the data. To reduce this access time, some computer architectures incorporate various levels of cache, which provide smaller yet faster blocks of addressable memory. When the processor generates a read request, the request is first sent to a cache. If the processor determines that the cache does not contain the requested data (i.e., a cache miss), the read request is sent to system memory. The data is retrieved from the system memory, and thereafter written to the processor and possibly the cache for subsequent use.
To reduce the cache xe2x80x9cmissxe2x80x9d rates, some computer systems include prefetch algorithms. When the processor reads data, the data associated with the successive addresses is also fetched and stored in the cache. For example, if the processor request addresses A0-A7, addresses A8-A15 will also be fetched from the system memory. The prefetch algorithm increases the xe2x80x9chitxe2x80x9d rate of the subsequent read request from the processor.
Such a prefetch method is disclosed in the publication by Norman J. Jouppi, xe2x80x9cIMPROVING DIRECT-MAPPED CACHE PERFORMANCE BY THE ADDITION OF A SMALL FULLY-ASSOCIATIVE CACHE AND PREFETCH BUFFERSxe2x80x9d, The 17th Annual International Symposium on Computer Architecture, May 28-31, 1990, pages 364-373. The system disclosed by Jouppi teaches the use of a stream buffer between the first level (L1) and second level (L2) caches of the CPU. When there is a cache miss in the L1 cache, the data is fetched from the L2 cache. When fetching from the L2 cache, the system also fetches successive addresses and stores the additional data in the stream buffer. When the CPU generates a subsequent read, the request is supplied to both the L1 cache and the stream buffer. If the stream buffer contains the addresses requested, the data is sent to the processor.
The addition of the stream buffer therefore improves the hit rate without polluting the L1 cache. If neither the stream buffer or L1 cache have the addresses, the data is fetched from the L2 cache along with a prefetch that replaces the data within the stream buffer. The stream buffer of the Jouppi system has a first in first out (xe2x80x9cFIFOxe2x80x9d) queue, so that if the requested data is not in the top line of the buffer, the data cannot be retrieved. The requested data is then fetched from the second level cache. The stream buffer will be flushed and restarted at the missed address.
Although the Jouppi concept improves the internal performance of multilevel cache systems, it does not solve the inherent latency problems between the CPU and system memory. Prefetches have not been desirable between a CPU and system memory because the extra time needed to read the additional data slows down the processor. The increased hit rate would not compensate for the delay in memory reads, thereby resulting in an inefficient system. It would therefore be desirable to have a system that would provide an efficient way of prefetching data from system memory.
Adapted for a computer system including a central processing unit (xe2x80x9cCPUxe2x80x9d), system memory and a bus, a bus interface unit is coupled between the CPU and the bus to obtain requested information and prefetch information from the system memory. The bus interface unit receives a first read request for information associated with a first address of system memory. The bus interface unit produces and places a request packet requesting the information and prefetch information associated with speculative addresses onto the bus to be read by system memory. Thereafter, the system memory provides the requested information and the prefetch information to the bus interface unit along the bus. The information is transmitted to the CPU. The prefetch information may be transmitted to the CPU depending on the nature of a subsequent request by the CPU.