1. Field of the Invention
The invention relates to a cache controller unit architecture for high performance microprocessor systems which uses a write-back cache memory. Particularly, an operating method for the cache controller unit (CCU) is provided to improve cache performance during dirty cache line write-back.
2. Description of Related Art
Modern high performance microprocessor systems usually employ a hierarchy of different memories. At the level closest to the central processing unit (CPU) core is cache memory, usually comprising high speed static random access memory (SRAM). The cache memory is usually on-chip with the CPU core so it can operate at the same clock speed as with the CPU. At the lower level, it is main memory which consists of whole physical memory space seen by the CPU. Main memory typically resides off-chip, and is slower but cheaper, e.g. dynamic random access memory (DRAM). The cache memory holds a sub-set of memory locations in the main memory. When the data address accessed by the CPU is in cache (a hit), the access goes to cache directly, so the CPU can process data without stalling. However, when the data address accessed by the CPU is not in cache (a miss), the access must go to the main memory, which usually takes a long time. In this case, the CPU must stall until data is returned from the main memory.
In a microprocessor system, the main memory may be accessed by a number of sources besides the CPU; for example, input-output (IO) devices or direct memory access (DMA) master. The cache memory must maintain cache coherency, and the main memory should contain the same copy of data as the cached. There are two approaches for this. One approach is write-through (WT) cache, where when the CPU writes to the data cache memory when the data is in the cache, it also writes the same data to the main memory, so the main memory always contains the same copy of data as the cache. The WT cache is easier to design and maintains cache coherency better, but always writes to the slower main memory, impacting CPU performance. The other approach is write-back (WB), where the CPU writes data to cache memory only when the data is in the cache, and the modified data is not updated to the main memory until some time later, to maintain cache coherency. One situation in which the modified or “dirty” cache line must be updated to cache is when a read miss resulting in the dirty cache line must be replaced. In this case, the dirty line must be read out from cache and put into the main memory before it is replaced by a new cache line. This incurs two serial operations; writing the dirty cache line from cache to the main memory, and reading the new cache line from the main memory to cache. The CPU must stall for the duration of the serial operation, causing performance reduction.
One common solution to serial transfer issues is to use a write buffer or register between cache and main memory to temporarily store the dirty line. FIG. 1 is a block diagram of a typical microprocessor system with cache memory. FIG. 2 is a flowchart of the operation of FIG. 1. In FIG. 1, the system includes a CPU 11, a tag memory 12, a cache memory 13, a cache control unit (CCU) 14, a main memory 15, a memory controller (MEMC) 16, a bus interface unit (BIU) 17 with a write buffer 171 and a system bus 18. As shown in FIGS. 1 and 2, when CPU 11 wants to access data in main memory, it issues a read/write command and an address to CCU 14 (S1). CCU 14 checks if the address exists in cache memory (S2) by comparing this address with the content of tag memory 12, containing the upper address bits for each cache line in the cache memory, and possibly containing some control bits for each cache line, such as valid bit, indicating the data in the cache line is valid and dirty bit, indicating the data in the cache line has been modified. If the address hits, the data is read from cache memory 13 to the CPU 11 (for read operation) or written to cache memory 13 from CPU 11 (for write operation) (S3). If the address misses, the required data is in main memory 15, and CCU 14 must redirect the access to BIU 17 responsible for accessing a number of devices connected to system bus 18, especially the MEMC 16, used to access main memory. BIU 17 usually contains a write buffer 171 to hold a dirty line written to the main memory 15 to maintain cache coherency. For re-direction, CCU 14 issues a fill request to BIU 17 (S4), to check if the line to be replaced is dirty or clean (S5). If the replace line is dirty, CCU 14 must wait until write buffer is available or empty (S6). Next, CCU 14 puts the dirty line into write buffer 171 (S7) and waits for the first request word available from BIU 17 (S8) to continue CPU 11 operation (S9).
However, this operation may create a worst-case condition as shown in FIG. 3. When data in write buffer 171 is not empty and BIU 17 has not started to update data in write buffer 171 to the main memory, a miss occurs as shown by solid line A of FIG. 3. Since the fill request always has higher priority in BIU 17, BIU 17 will service the fill request first, so CCU 14 must wait at step S6 even though the miss line has been filled (at dotted line B of FIG. 3). In this case, it must wait until the miss line has been filled and write buffer 171 empties (at solid line C of FIG. 3), impacting CPU performance.