This invention relates in general to memory caching systems and, more specifically, to an apparatus and methods for allowing buffering of commands for a cache memory.
Processors are clocked at ever increasing frequencies to increase performance of the systems in which they are embedded. Today, these frequencies are approaching one gigahertz. Although the clock frequency of the processors is increasing, some memory has not kept pace with this evolution.
There are two major categories of memory, namely, static random access memory (SRAM) and dynamic random access memory (DRAM). SRAM can operate at speeds approaching one gigahertz, but DRAM only operates at speeds approaching two hundred megahertz. With this in mind, designers could use SRAM in order to have memory operate at the same clock frequency as the processor, however SRAM is much more costly than DRAM. This cost differential is attributable to the fact that a SRAM memory cell takes about eight transistors to implement, while a DRAM memory cell only takes one. Accordingly, most processing systems have far more DRAM than SRAM.
To achieve speeds with DRAM which approach SRAM speeds, memory cache circuits are used. Memory caches use a small SRAM which is mapped to a larger DRAM typically, outside the processor. Memory caches work under the principal that most read or write operations are fulfilled by the cache and do not require a time intensive read from external memory. Even for moderately sized memory caches, hit rates are near ninety-nine percent.
Although most processors have an on chip cache, there is further need for improving cache architectures. One common problem in cache architectures is where a write operation is immediately followed by a read operation. The write operation to a data memory in the cache is subdivided into two parts: checking a tag memory for a hit and writing to the data memory when there is a hit. The read operation from data memory is also subdivided into two parts: checking tag memory for a hit and reading the appropriate set from the data memory when there is a hit. To speed execution of the read operation, both parts are executed simultaneously and once a hit is determined, the proper data is selected from the set which has been already read. In this way, the read operation can execute in one clock cycle while the write operation takes two clock cycles to execute its two parts.
In conventional cache architectures, only a single access of data memory is possible at the same time. When the write operation is immediately followed by a read operation, the write to the data memory in the second clock cycle clashes with the read from data memory of the subsequent read operation. In Table I, this clash occurs in cycle n+1 and is characterized by both write and read operations attempting to access the data memory at the same time which is not possible. To avoid this problem some conventional processors stall execution so that the write operation can complete before starting the read operation, as shown in Table II. Those skilled in the art appreciate that stalling the processor reduces performance of the system because the two pipelined operations require three cycle to complete.
Some have solved the back-to-back write-before-read problem by increasing the speed of the cache. If the cache runs at a frequency twice as fast as the frequency of the processor, the write operation can be completed in a single clock cycle of the processor. This technique is effective, but it requires the cache to run at twice the frequency of the processor. However, as processor clock frequencies approach one gigahertz, conventional techniques cannot run the cache at twice that frequency. Accordingly, new techniques are needed to solve the back-to-back write-before-read problem.
According to the invention, disclosed are an apparatus and methods which allow for processing back-to-back write and read operations without stalling the processor. In one embodiment, a cache memory subsystem buffers write operations between a central processing unit (CPU) and the cache memory subsystem. Included in the cache memory subsystem are a tag memory, a data memory and a store buffer. The store buffer is coupled to both the data memory and the tag memory. Additionally, the store buffer stores a write operation.
In another embodiment, a process for storing information in a memory cache is disclosed. The process includes receiving a write operation and queuing the write operation while other operations are performed. At a later time, the write operation is executed. The write operation may be queued in a store buffer, for example.
In yet another embodiment, a process for performing back-to-back cache operations is disclosed. In one step, a write operation is received and queued. A read operation is received and executed in other steps. After queuing, the write operation is executed.