1. Field of the Invention
The present invention relates to memory architectures and in particular to a cache memory architecture.
2. Background Information
The speed with which a processor can access data is critical to its performance. At the same time, providing uniformly fast memory access can be cost prohibitive. To get around this problem, computer architectures have relied on a mix of fast, less dense, memory and slower bulk memory. In fact, many computer architectures have a multilevel memory architecture in which an attempt is made to find information in the fastest memory. If the information is not in that memory, a check is made at the next fastest memory. This process continues down through the memory hierarchy until the information sought is found. One critical component in such a memory hierarchy is a cache memory.
Cache memories rely on the principle of locality to attempt to increase the likelihood that a processor will find the information it is looking for in the cache memory. To do this, cache memories typically store contiguous blocks of data. In addition, the cache memory stores a tag which is compared to an address to determine whether the information the processor is seeking is present in the cache memory. Finally, the cache memory may contain status or error correcting codes (ECC). Cache memories are usually constructed from higher speed memory devices such as static random access memory (SRAM).
In the case where the processor operates on longwords (i.e. four 16-bit words), processor-cache interfaces described to date use a 64-bit bus for data and an additional bus for tag. The tag bus width varies, but has nominally been 16-bit for a total of 80 bits. The problem with such an approach is that if the cache block (also called line) size is four times the data bus width, then no useful information appears on the tag bus for three out of every four bus cycles. This is a waste of bus bandwidth which can adversely affect processor performance.
In addition, the typical cache memory transfers a cache line as a contiguous block of data, starting at the first word in the cache line and proceeding through to the last. This method of transferring cache lines does not take into account the fact that the processor may have no need for the first word in the cache line and that, therefore, it must wait a number of cycles until the word it is looking for is transferred.
What is needed is a system and method for storing and retrieving cache data which increases utilization of the bandwidth available at the processor-cache interface. In addition, what is needed is a new SRAM architecture which not only increases processor-cache interface bandwidth utilization but which also can be used for a number of different data bus widths. In addition, what is needed is a method of ordering the data transferred from cache memory to the processor which increases the likelihood that useful data is transferred in the first transfer cycle.