Throughout the development of computer systems, a primary emphasis has been on increasing the speed of such systems and their ability to handle larger and more complicated programs while reducing their cost. In order to increase the ability of a computer system, it is necessary to both increase the size of the random access memory (RAM) so that larger programs may be utilized by the computer system and to increase the speed at which access to that RAM is afforded. The straightforward method of increasing access speed is to use components which operate more quickly. However, such rapidly-operating components are more expensive than slower memory components.
Due to the cost involved in providing high speed RAM, advanced computer systems have used high-speed caching arrangements to increase the operational speed of the memory system. A caching arrangement provides a small portion of especially fast memory in addition to the regular RAM. As commands are issued and data is utilized, the information is called from RAM and stored in this cache memory. As each new read or write command is issued, the system looks to the fast memory cache to determine if the information is stored in the cache. If the information is available in the cache memory, access to the RAM is not required and the command may be processed or the data accessed much more rapidly. If the information is not available in the cache memory, then new data may be copied from the main memory and stored in the cache memory where it can be accessed and remains ready for later use by the system. In well-designed memory systems, the information sought lies in the cache memory over 90% of the time on an average. Consequently, use of the cache memory substantially speeds the overall operation of the memory utilized in a computer system.
In order to further enhance the speed of operation of computer systems, it has been found desirable to directly associate a small portion of extremely rapid cache memory directly on a processor chip. For example, it may be useful to provide such a small fast cache memory consisting of 8 Kbytes of memory directly on the chip with the other elements of a CPU. Such an arrangement is capable of greatly increasing the speed of the operation of the system to a great degree for information which is used repeatedly by various processes.
Typically, static memories, such as static random access memories (SRAM) are used for cache memories. These memories are fabricated from a plurality of bistable circuits (flip-flops), each forming a cell for storing one bit of data. In the memory array, bit lines are used to read data from the cells. These bit lines are also used during writing by driving the lines to the desired state as determined by the incoming data.
Another approach for achieving high performance in microprocessors is to execute multiple instructions per clock, typically referred to as superscalar. In order to effectively execute multiple instructions per clock, the microprocessor must prevent availability of operands from being the bottleneck during the execution stages. For instance, where multiple operands need to access the on-chip data cache, a bottleneck may occur. Thus, a microprocessor needs a data cache which is able to accommodate simultaneous multiple data references per clock. In the prior art, to accommodate multiple data references, a multi-ported scheme is normally employed. Under such a scheme, a multi-ported memory cell having as many ports as the number of simultaneous data references is used.
Dual port static memory cells are also known in the prior art. In example of one such cell is shown in U.S. Pat. No. 4,823,314. The dual-ported memory cell is often used to accommodate multiple data references to a memory. However, the dual-ported RAM cell requires two more transistors than a six transistor single-ported SRAM cell, two pairs of bit lines and two word lines. Furthermore, the transistors for the cross-coupled inverters in the dual-ported cell need to be larger to drive the larger load off the extra pair of bit lines. In comparison, the dual-ported SRAM is found to be much larger than a single-ported cell, approximately 1.7 times the size of the single-ported cell. Moreover, the dual-ported scheme requires twice the number of sense amplifiers that a single-ported scheme would require. The dual-ported non-interleaved scheme would also burn twice as much power as a typical single-ported cache because of twice the number of sense amplifiers would be enabled during an access to the cache.
Another method utilized to improve memory access times is through optimizing the memory organization. Memories can be organized in banks such that reading or writing multiple words can occur at one time, instead of just a single word. These banks are typically one word wide so that the width of the bus and the cache need not change. Thus, by sending addresses to multiple banks, the cache can read multiple references simultaneously. Banks are also valuable on write operations. While back-to-back writes would normally have to wait for earlier write operations to be completed, a multiple bank organization allows multiple writes per clock, provided that these writes are not destined for the same bank.
In the prior art, multiple memory controllers are employed to control the access to the cache memory banks. A memory controller was associated with each memory bank. The multiple memory controllers allow banks to operate independently. For instance, an input device may use one controller and its memory, the cache may use another, and a vector unit may utilize a third. The number of controllers actually used is usually small in number. In such a cache memory system, if the number of controllers is low, the possibility of a conflict between memory references competing for the same bank is greater. Therefore to reduce chances of conflicts, many banks are needed.
As will be seen, the present invention provides a single-ported cache memory which can accommodate multiple data references per clock cycle in a microprocessor. The multiple data accesses per clock are made possible because the data cache of the present invention is interleaved.