1. Field of Invention
This invention relates generally to microprocessors and specifically to maintaining data coherency in microprocessors.
2. Description of Related Art
Modern computer systems utilize a hierarchy of memory elements in order to realize an optimum balance between the speed, size, and cost of computer memory. These computer systems typically employ a primary memory such as dynamic random access memory (DRAM) and a larger, but much slower, secondary memory such as a magnetic storage device or hard disk. A small, fast cache memory such as static random access memory (SRAM) is typically provided between the central processing unit (CPU) and primary memory. This fast cache memory increases the data bandwidth of the computer system by storing information most recently needed by the CPU. In this manner, information most recently requested during execution of a computer program may be rapidly provided to the CPU from the cache memory, thereby eliminating the need to access the much slower primary and secondary memories. Although fast, the cache memory is very expensive and is therefore typically small to minimize costs.
FIG. 1 illustrates a well-known general computer system 100 having a central processing unit (CPU) 102 including CPU execution units 104, an internal (e.g., level 1 (L1)) cache memory 106, an external cache controller 108, and a primary memory controller 110. Typically, internal cache 106 is divided into an instruction cache, in which the most recently requested instructions are stored, and a data cache, in which the most recently requested data is stored. External cache controller 108 is coupled to and controls an external (e.g., level 2 (L2)) cache memory 109, and memory controller 110 is coupled to and controls primary memory 112. Although not shown for simplicity, memory controller 110 may include a write queue to store pending write requests for primary memory 112 and a read queue to store pending read requests for primary memory 112. CPU 102 is also coupled to a system bus 114, which in turn is coupled to a secondary memory 116 via an input/output (I/O) controller 118, to a monitor 120 via I/O controller 122, and to a network connection 124 via I/O controller 126.
During execution of a computer program, the computer program instructs CPU 102 to fetch instructions by incrementing a program counter within CPU 102. In response thereto, CPU 102 fetches the instructions identified by the program counter. If the identified instruction requests data, an address request specifying the location of that data is issued. CPU 102 first searches internal cache 106 for the specified data. If the specified data is found in internal cache 106 (a cache hit), that data is immediately provided to CPU execution units 104 for processing. If, on the other hand, the specified data is not found in internal cache 106, external cache 109 is then searched.
If the specified data is not found in external cache 109, the address request is provided to memory controller 110, which in turn queues the address request in its memory read queue. The memory read queue provides the read request to primary memory 112, which in turn searches for the requested data. In response thereto, primary memory 112 returns the requested data to CPU execution units 104 for processing. Primary memory 112 also returns the corresponding line of data to internal cache 106 so that subsequent address requests identifying other information in the data line will result in an internal cache hit, thereby allowing the data to be returned from internal cache 106 rather than from the much slower primary memory. In this manner, latencies associated with accessing primary memory may be hidden, thereby increasing the data bandwidth of CPU 102.
Data stored in lines of internal cache 106 may be modified by CPU execution units 104 in response to the instructions of the computer program and, therefore, may not always be consistent with the original copy stored in primary memory 112. Typically, modified data stored in a line of internal cache 106 is not written back to primary memory 112 until the cache line is needed for storing new data retrieved from primary memory. During a well-known cache replacement operation, a line of internal cache 106 is selected to store the new data. If the cache line to be replaced has not been modified, and thus is consistent with the original copy in primary memory 112, the cache line is deleted. On the other hand, if the cache line has been modified, and is thus no longer consistent with the original copy in primary memory 112, the cache line is written back to primary memory 112 during a well-known writeback operation. During writeback, a write request identifying the modified cache data is provided to primary memory controller 110, which in turn stores the address request in its write queue. The write queue then forwards the write request to primary memory 112, which in turn updates the identified address with the modified data. In this manner, data coherency is maintained.
The read and write queues, which may be a single queue, typically forward their respective read and write requests to primary memory 112 in the same order in which they were issued by CPU 102 in order to maintain proper ordering, which in turn ensures data coherency. Thus, in dispatching requests to primary memory 112, write requests are intertwined with read requests, as determined by the execution order of their corresponding instructions. Because maintaining a constant execution flow in the CPU pipeline is dependent upon the prompt return of fetched instructions and the data requested by the fetched instructions, i.e., upon the prompt servicing of read requests, servicing write requests to primary memory may undesirably stall the execution of instructions in the pipeline by delaying the dispatch of read requests to primary memory.
Further, because each instance in which a write request is dispatched to primary memory 112 after a read request requires primary memory 112 to switch from a read operation to a write operation, and vice versa, the intertwining of read and write requests dispatched to primary memory 112 may result in a significant number of switches between primary memory read and write operations. The delays associated with switching between primary memory read and write operations may reduce the performance of CPU 102.
Accordingly, it would be desirable to dispatch read and write requests to primary memory in a manner that minimizes pipeline execution stalls and minimizes the frequency with which primary memory switches between read and write operations.