1. Field of the Invention
This invention relates generally to digital computer systems having cache RAM memories and, more specifically, to a digital computer system having an improved cache controller for a CPU with address pipelining and method therefor.
2. Description of the Related Art
Many digital computer systems employ second level (L2) cache memories in order to improve system performance. The L2 cache, well known in the art, is a relatively small and fast memory device that is loaded with information, either instruction code or data, from a slave device such as the main memory (RAM) or other external memory device. Using the property of locality of reference, this information has a statistically good chance of being needed by the CPU in a future cycle. Among the various types of cache memories, direct-mapped cache memories are relatively small in terms of real estate and inexpensive and are therefore more desirable than other types of caches, such as the larger and more costly set-associative cache memory, for improving the system performance of lower cost microprocessor based products, such as limited purpose PC systems, palmtops, personal organizers and other similar products.
In order to map the cache to the slave device, such as the main memory, a direct-mapped cache divides the address in the main memory that the CPU needs to access into three fields. The tag field constitutes a set number of the most significant bits of the address. The other two fields, the block and entry fields, constitute the remaining bits of the address and together are called the index. The block represents the general area in memory within the requested word of information is found and the entry represents the specific word being accessed. The number of bits in the index field represents the number of address bits required to access the cache memory. Each word in the cache consists of the data or code word and its associated tag. There is also typically an additional bit, associated with each entry (word) field in the cache, called a valid bit, to indicate whether or not the word contains valid data. When the CPU generates a memory request, the index field (block and entry) is used for the address to access the cache. The tag field of the address is compared with the tag in the word read from the cache. If the two tags match and the valid bit indicates the information in the entry (word) is valid, there is a "hit". The cache then supplies two flags to the cache controller, one indicating there is a tag match (Tmatch) and the other indicating there is a valid match (Vmatch) and the desired word is read from the cache. If there is no match, because of either a tag miss (Tmiss) or a valid miss (Vmiss), there is a "miss" and the required word is read from main memory. An entire block of words (sequential addresses) within which the required word resides in main memory is then brought into the cache replacing the previous block of information.
Although direct-mapped cache memories improve CPU performance whenever a hit occurs, the cache hit-to-miss ratio, or, efficiency, of direct-mapped caches is relatively low when compared with other types of caches. For example, the hit ratio of direct-mapped caches is about 70% as compared to the 90%+ hit ratio of other, larger and more sophisticated caches. However, because direct-mapped caches are relatively inexpensive and small and desirable for many applications, there existed a definite need to further improve the efficiency of systems designed with direct-mapped caches.
Some central processing units (CPU's), such as Intel's 386 microprocessor, offer a feature called address pipelining whereby the CPU presents address and status information for the upcoming CPU cycle before the current cycle has completed. In the past, address pipelining was successfully employed to improve CPU performance for non-memory read cycles, non-cacheable memory read cycles and memory read cycles with no cache present. The use of CPU address pipelining was a potentially attractive solution for improving the performance of cacheable read cycles, as well, and could, therefore, improve overall performance in a system designed with direct-mapped caching. However, if address pipelining was used to perform an early search (read) of the cache for a potential hit for the "pipelined" cycle before the current cycle completed its operation an inherent problem would exist. Specifically, there are a number of conditions where the cache would return the wrong flags for the pipelined cycle simply because the current cycle had not yet completed it operation. For example, the current CPU cycle is executing a read operation to memory address "x" in main memory. The location in the cache that corresponds to the index field of the address is checked to determine if there is a hit. In this case there is a miss, either because of a Tmiss or Vmiss. Therefore the CPU must retrieve the required word from the slave memory and the cache must be loaded with a new block containing the required word. In the meantime, however, the CPU puts out an early pipelined address which happens to be to the same address as the current cycle in progress. The cache controller issues a read command to the cache. Although this next read should properly be a hit in the cache, the cache will return flags that indicate an erroneous miss because this read of the cache is out of cycle (i.e. too early) and the cache has not yet been loaded with the new block from the current cycle. Unnecessarily and inefficiently, the CPU will then read and retrieve this pipelined cycle from the relatively slow slave memory.
Therefore, there existed a need to provide digital computer system having a "smart" cache controller that would enable the system to take advantage of CPU address pipelining while minimizing the performance impact of a pipelined cache read miss in a system with a relatively low hit ratio such as a direct-mapped cache.