In conventional data processing systems, cache memories are often used to improve performance and reduce the use of limited system bus resources. A cache memory is typically a small high-speed memory that is dedicated to a device, such as a processor, and connected to the device by a special bus. During the normal course of operations, as each processor reads data from main memory locations, it stores the data value and its main memory address in the cache memory in a unit called a “cache line”. Since a processor frequently reads from the same memory locations as it executes program or operating system code, once the corresponding cache lines have been stored in the cache memory, the processor can read the data directly from the cache memory and does not need to read the main memory using the system bus.
A conventional data processing system 100 incorporating cache memories is shown as a block schematic diagram in FIG. 1. In a system of this type one or more processors 102, 104 are connected by a host bus 110 to a host bridge 112. Each of processors 102, 104 is provided with a cache memory 106, 108, respectively, located in the data path between the processor and the host bus 110. Cache memories 106 and 108 are typically provided with a cache memory controller (not shown in FIG. 1) that controls and coordinates the operation of the associated cache memory.
Each processor is also provided with an associated memory. For example, processor 106 is provided with memory 107 and processor 108 is provided with memory 109. Although memories 107 and 109 are illustrated as attached to the respective cache memories, each processor would typically have a memory controller (not shown) that manages its associated memory. The host bridge 112 manages the communications between the processors 102, 104 and video devices 114, such as monitors, as indicated schematically by line 116. The host bridge also manages communication between the processors 102 and 104 and peripherals 124, 126 via a peripheral bus 122.
Data can also be stored in a cache memory, for example, cache memory 106 by its associated processor 102 during a memory write operation. In this type of operation, data to be written into memory 107 and its associated memory address is instead written into the cache memory 106. The data values written to the cache memory 106 can be “flushed” or written to memory 107 at the addresses stored in the cache memory with the data either at the time that the information is written into the cache memory 106 or at a later time.
This caching technique can also be used to improve the performance of other system components, such as memory management units (MMUs). For example, in a memory mapped input/output (I/O) system, an I/O MMU 128 located in the host bridge 112 may be used to translate between logical I/O addresses and physical device addresses. The I/O addresses are typically divided into blocks and each block of addresses is mapped by the MMU to a corresponding block of actual physical I/O device addresses. The mapping between I/O addresses and physical addresses is specified by a page table directory 132 that is stored in one of the processor memories, for example memory 109. An I/O MMU 128 typically contains a small “translation lookaside buffer” (TLB) 130 to cache some I/O address space to physical address space translations in order to avoid repeatedly fetching translations from the page table directory 132 for every I/O address to physical address transaction. The operation of the TLB 130 is similar to the operation of a processor cache memory 102, 104 except that typically the TLB 130 can be read by the I/O MMU 128, but not written by the MMU 128.
Often, in a multiprocessor system such as system 100, the processors 102, 104 operate with a shared memory area that is duplicated in both of the memories 107 and 109. When all of the processors 102, 104 share a single logical image of the shared memory, the system is said to be “coherent.” In a coherent system, if all of the processors 102, 104 read data from the same memory address, they all receive the same data value. Achieving memory coherence is complicated by the use of cache memories, such as cache memories 102, 104 and 130. In particular, because each processor 102, 104 or I/O MMU 128 in a multiprocessing system has its own private copy of small portions of the shared memory area in its dedicated cache memory 106, 108 or TLB 130, these copies can deviate from one another. For example, when a processor, such as processor 102, writes data into its cache memory 106 and changes the value that was located there, the value in that memory 106 may differ from a corresponding value in all other cache memories that contain a corresponding cache line and from the shared memory area in memories 107 and 109. Accordingly, a read to the same location in either its associated memory 109 or its cache memory 108 by another processor 104 can retrieve different data—the system is no longer coherent. A similar problem can occur in MMUs when entries in the page table directory 132 in memory 109 are changed by other software, such as direct memory access (DMA) software. In this latter case, the MMU 128 sees an incoherent view of memory 109.
A common method of solving this memory coherence problem is called “bus snooping.” In this technique, when a request for a read or write transaction to a cache memory is made to a cache memory controller, that controller broadcasts the request to all other cache memory controllers either over the normal system bus or over a special snooping bus. This request may be the virtual address provided to the controller or a physical address which is generated by the controller from the virtual address using conventional address translation facilities. When such a request is detected, each cache controller “snoops” its associated cache memory to determine whether the corresponding cache line is stored in the cache memory and is marked as shared by the other processors. “Snoop results” are generated from this determination and used to address the memory incoherency in known manners. For example, the controller may invalidate the (now incorrect) cache line if the request was a write request. Alternatively, it may forward a changed data value from its cache to the controller that received an original read transaction to insure that both caches contain the same values.
Bus snooping can maintain memory coherency, but may adversely affect system performance. For example, a typical method for determining whether a particular cache line is present in a cache memory is to propagate the address in the request to the cache memory and use the normal cache memory “hit” mechanism to determine whether that line is in the cache memory (a “hit” occurs) and the line is marked as shared. However, because the cache memory controller must also examine the contents of the cache memory during the normal processing of each transaction, the snooping operations can slow down processing of the normal cache access requests received from processors or I/O MMUs.
Several techniques have been developed to prevent this dual access from affecting cache memory speed. For example, a special dual ported cache memory may be used. This memory has a “cache hit port” that can be used by the cache controller to access the cache memory to determine whether a snooping hit occurs at the same time that the controller is accessing the memory via the other memory port in order to service requests from a processor or an I/O MMU. Nevertheless, because the transaction address must still be propagated within the cache memory to determine whether a hit occurs, cache memory processing can still be slowed. In addition, a special dual-ported memory is required. Further, since the cache memory must still be examined in order to determine whether a hit occurs, it may take some time to generate the snoop results.
Another approach to improving snooping performance is to separate each cache line into its data portion and its address portion (called a “tag”.) Then the data portions of the cache lines are stored in a cache data store and the tag portions are stored in a tag store. In order to improve snooping performance, duplicate tag stores are used. One tag store is examined by the cache controller to detect a cache line hit during a snooping operation and the other tag store is used for normal processor or MMU cache line hit detection. The duplicate tag store technique avoids requirement for a separate hit port and produces the snoop results quickly, but the duplicate tag structure is costly because it usually contains critical paths and requires large numbers of gates to implement.
Therefore, there is a need for a simple mechanism to maintain cache coherency.