1. Field of the Invention
The present invention relates to computer memory systems and, more particularly, to enhancements to computer memory systems in order to speed the operation of a computer.
2. Description of Related Art
Traditional computer architectures require frequent exchanges of information between one or more processing units and the computer's memory subsystem. Processors retrieve information from the memory subsystem by passing the address identifying a particular memory location to a memory controller in the memory subsystem. The memory will usually maintain both the instructions that the processor is to execute as well as the data upon which a processor is to execute the instructions. The instructions that the processor executes are frequently referred to as code, and, reading a line of code from the memory subsystem is referred to as a code-read. The information to which the processor applies the instructions is usually referred to as data, and, reading data is referred to as a data-read.
One of the parameters that limits the speed of operation of a computer is the time required to extract code and data from the memory subsystem and supply it to the processor. The memory subsystem and the processor or processors of the computer will usually exchange information over a memory bus. A conventional memory subsystem will comprise a block of dynamic random access memory (DRAM) in an array as well as control circuitry which can translate addresses into control signals for reading and writing to particular locations in the memory array. The processor supplies addresses and control signals to the memory subsystem via the memory bus. The address on a read will be translated by the DRAM controller into a memory location. The controller will then activate the memory reading circuitry of the DRAM array and cause the information from that location to be passed through the DRAM datapath, a collection of buffers, to the memory data bus, and back to the processor. The time it takes to access locations in the DRAM is one of the speed-limiting characteristics of computer architectures. For each occasion a memory location in the DRAM is accessed, a significant amount of time is required.
One enhancement that has found widespread application in the microcomputer field is the introduction of cache memories. The theory of a cache is that a system attains a higher speed by using a small portion of very fast memory along with the slower main memory of the memory subsystem. When a particular memory location is addressed by the processor, the cache memory is supplied with the entire block of memory surrounding that memory location. Statistically, subsequent memory requirements by the processor will be in consecutive locations in the memory. This is particularly true for code-reads which are very sequential and very rarely written over. Thus, when a read is requested by a processor, an entire block of memory around that location is copied to the cache which is a much faster memory. Then, subsequent reads will not have to be accessed in the main memory and retrievals from the cache will supply information to the processor relatively quickly.
A further enhancement to systems which have external cache memories is the introduction of a read-prefetch buffer. The read-prefetch buffer is used such that on a read miss in the cache, in addition to fetching the desired address, the next sequential memory location from the main memory is put into the read-prefetch buffer of the cache because there is a likelihood that it will be the next addressed block of data.
In the enhanced cache system with read-prefetching, the procedure is that on a read-miss in the cache, the missed line is fetched from the memory subsystem. In addition to fetching the missed line, the cache also prefetches the next line from the memory subsystem into the cache's read-prefetch buffer. If the next read from the cache is to the memory location which has been prefetched (prefetch hit), the cache reads the line from the read-prefetch buffer and does not have to read the line from the memory subsystem. Since the line can be read from the prefetch buffer much faster than from the memory subsystem, read response time for prefetched hits is greatly reduced.
There are a number of disadvantages in utilizing the above-described prefetching method. The approach is not useful for systems without a cache. But, caches are expensive and low cost computer systems are usually built without a cache. Another disadvantage is that following a read-miss in the cache, the next cache line is prefetched from the memory subsystem. While this line is being prefetched, if there is another miss in the cache which is not to the line being prefetched, then this miss, the second one, would be delayed (referred to as a prefetch penalty) until the prefetch finishes since there is no easy way to abort the prefetch, a high prefetch penalty. Another disadvantage to prefetching to the cache is that it results in increased bus utilization on the memory bus. If the cache line fetched into the prefetch buffer is not used by the cache before it is discarded, then the time spent on the bus prefetching the line is wasted. This results in increased bus utilization on the memory bus which prevents other agents from getting onto the bus. This increase in bus utilization is a function of the hit rate in the prefetch buffer. Typically, with the cache, this hit rate is low and more than one prefetch buffer has to be implemented in order to get a good hit rate. These buffers take up additional area on the chip resulting in higher costs. Also, with an increased number of prefetch buffers, buffer management becomes more expensive.
Therefore, it is an object of the present invention to provide an efficient prefetching approach which does not suffer the traditional disadvantages of the above-described method while still being implementable in a low-cost computer system.