Conventional computer systems utilize memory systems that provide data to the central processing unit (CPU) in response to load instructions and store data into the memory systems in response to store instructions. The cost per computation for the CPU has decreased much faster than the cost per byte of memory. In addition, as computational tasks have become more complex, the size of the main computer memory has dramatically increased. As a result, providing a main memory that operates at the same speed as the CPU has become economically impractical.
To avoid the high cost of providing a main memory that operates at CPU computational speeds, many systems utilize cache memories. A cache is a high speed buffer used to store the most recently used data. When load instructions are issued to the cache, the cache checks its buffer to determine if the data is present. If the data is already present in the cache, the cache returns the data to the CPU. If the data is not present, the cache must load the data from the main memory. Since the main memory is much slower than the cache, this results in a significant delay in the program execution. Each time the cache loads data from the main memory, some of the data stored in the cache must be eliminated to make room for the new data.
Similarly, store instructions are also issued to the cache. If the data for the address specified in the store instruction is already in the cache, the cache updates the data to reflect the values specified in the store instruction. If the data is not present, the cache makes an entry for the address specified in the store instruction and notes the data to be stored at that address. In the case of a "write-through" cache, the data is also sent immediately to the main memory so that the main memory always has a correct copy of the data. In non-write-through cache systems, the data entry in the cache is marked to indicate that it differs from the value stored at the address in question in the main memory. When the data entry is replaced during a subsequent operation, the entry so marked is written to the main memory prior to being replaced.
To be effective, the data in the cache must be utilized, on average, a number of times before it is displaced from the cache by new data entering from the main memory in response to load instructions that cannot be satisfied by the data already in the cache. Each time data is acquired from the main memory, the CPU must wait. If the data is used several times while it is in the cache, this delay is amortized over several load instructions; hence, the average delay per load instruction is substantially reduced. No such reduction occurs if the data is used only once.
In fact, utilizing the cache for transferring data that is to be used only once actually degrades system performance. As noted above, each time a new data word is moved into the cache from main memory, a data word stored in the cache must be eliminated. Some of the data words that are eliminated would have been used again had these words not been eliminated in response to the load instruction for the data word that was to be used only once. Hence, passing data words that are to be used only once during their residence time in the cache degrades cache performance. This degradation can be reduced by increasing the size of the cache; however, this solution substantially increases the cost of the cache.
The time delay encountered in retrieving data from the main memory is often dominated by the latency time of the main memory. The latency time of the memory system is defined to be the number of cycles after the initiation of the load operation at which the data for the load is returned from the memory and is available for use. One method for avoiding this inefficiency is to issue the load instruction sufficiently before the need for the data to allow the memory time to retrieve the data and have it ready when needed. However, this solution results in two problems. First, an intervening store instruction directed to the same memory address can result in erroneous data being returned to the CPU.
Second, if data being retrieved will only be used once, accessing the data through the cache can actually reduce the efficiency of the cache. To be effective, the data in the cache must be utilized, on average, a number of times before it is displaced from the cache by new data entering from the main memory in response to load instructions that can not be satisfied by the data already in the cache. Each time data is acquired from the main memory, the CPU must wait. If the data is used several times while it is in the cache, this delay is amortized over several load instructions; hence, the average delay per load instruction is substantially reduced. No such reduction occurs if the data is used only once. Furthermore, the data displaced by the incoming data may need to be reloaded into the cache in response to a subsequent load instruction. When this occurs, the system will be further penalized by the delays in reloading the displaced data.
Broadly, it is the object of the present invention to provide an improved computer memory system.
It is a further object of the present invention to provide a memory system in which the data that is to be used only a few times need not pass through the cache memory.
It is a still further object of the present invention to provide a cache memory system in which the delays resulting from the latency time of the main memory are substantially reduced compared to prior art systems.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the present invention and the accompanying drawings.