The present invention relates to a method and/or architecture for pipelined processors and, more particularly, to a method and/or architecture for reading from and writing to a cache memory.
In a pipelined central processing unit (CPU), it is highly desirable that all operations for a data cache memory be performed in the same pipeline stage. This allows loads and stores to flow in the pipeline without losing performance due to resource contention. It is also highly desirable to use synchronous (i.e., clocked) random access memory (RAM) in the cache memory to avoid problems associated with asynchronous RAMS. The combination of synchronous RAMS and a pipelined CPU results in two timing problems that need to be solved.
The first problem is a write data timing problem. Ideally, write data items should be transferred at the same point in the pipeline as read data items. In synchronous RAMS, read data items become valid within a propagation time delay after the RAM is clocked. However, write data items and write enable signals must be stable during a set-up time before the RAM is clocked.
The second problem is a write enable timing problem. There are several reasons why timing of a write enable signal needs to be one cycle later than the natural timing before the clock for synchronous RAMS. In systems where all or a part of a physical address is used as a data tag in the cache memory, a memory management unit operation must be performed during a cache write operation to convert a virtual address into the physical address. This conversion should be performed in parallel with a tag RAM access so that the data tag and a stored tag can be compared. When the memory management unit (MMU) flags an MMU exception, stores to the cache memory must be inhibited. Furthermore, in two or more way set associative cache memories, access to the tag RAM is required to determine which associative set of the cache memory should receive the write data. Only the associative set that produces a cache-hit, if any, should receive the write data.
It would be desirable to implement a mechanism and method of operation for a cache memory design to handle write data items and write enables one cycle later than the natural timing of synchronous RAMS.
The present invention concerns a circuit comprising a cache memory, a memory management unit and a logic circuit. The cache memory may be configured as a plurality of associative sets. The memory management unit may be configured to determine a data tag for an address of a data item. The logic circuit may be configured to (i) determine a selected set from the plurality of associative sets that produces a cache-hit for the data tag, (ii) buffer the address and the data item during a cycle, and (iii) present the data item to the cache memory for storing in the selected set during a subsequent cycle.
The objects, features and advantages of the present invention include providing a method and architecture for a cache memory buffering mechanism that may (i) simplify timing contentions between write set-up timing requirements and read propagation delay requirements; (ii) present a data item in the memory stage of a pipelined processor after initiating a load operation to cache memory for that data item within the memory stage; (iii) accommodate back-to-back store operations to the cache memory without delaying or stalling a pipeline by sequentially buffering both store operations outside the cache memory; and/or (iv) accommodate back-to-back store operations to the cache memory without delaying or stalling the pipeline by buffering only the second store operation outside the cache memory.