In a modern computer system architecture, multiple memory caches are maintained to reduce latency in accessing main memory. A cache typically has a plurality of entries. Each entry holds a certain number of bytes, known as a cache line.
When a processor makes a change to data in main memory, copies of the data maintained in one or more of the caches may become inconsistent (e.g., stale or out of date) with main memory or with other caches. Similarly, when a processor makes a change to data in one of the caches, copies of the data in main memory or in other caches may become inconsistent with the changed data. Cache coherency protocols are used to maintain the consistency of data between caches. There are two general classes of cache coherency protocols: snooping, and directory-based.
In a snooping protocol, every cache that has a copy of the data from a block of physical memory also has a copy of information about the data block. Each cache is typically located on a shared memory bus, and all cache controllers monitor or snoop on the bus to determine whether or not they have a copy of the shared block. Typically, snooping protocols have primarily been used in small multiprocessor or single processor systems, while larger high-performance multiprocessor systems use directory-based coherency protocols.
In a directory-based protocol, a directory is used to maintain memory cache coherency and state. A coherency unit is a block of physical memory corresponding to a cache line. Typically, the smallest unit of memory that can be transferred between the main memory and the cache is a coherency unit. Information about one coherency unit is kept in just one location; that is, each coherency unit has a directory. Information in the directory generally includes which cache or caches have a copy of the coherency unit, and whether that copy is marked exclusive for future modification. An access to a particular coherency unit first queries the directory. If a cache has an exclusive copy, the memory data in the coherency unit may be stale. The real data may then be a modified cache line residing in the exclusive cache. If it is possible that the data in the coherency unit is stale, then the cache containing the real data is forced to return its data to the corresponding coherency unit in main memory. The physical memory then forwards the data to the new requester, updating the directory with the new cache location of that coherency unit.
Often, dynamic random access memory (DRAM) memory is used to store the directory. However, a great deal of memory bandwidth and power may be consumed to access and update directories. The use of an on-chip directory cache may reduce the amount of external memory bandwidth, but is generally not desired because of increased application-specific integrated circuit (ASIC) costs due to an increase in silicon area and design complexity, and increased on-chip power requirements.
Unlike older dual inline memory modules (DIMMs), fully-buffered DIMM (FBD) modules use high speed serial links. FBD DIMMs generally are able to provide twice as much read bandwidth as write bandwidth, based upon a design assumption that applications will require more read bandwidth than write bandwidth. This assumption is problematic for existing directory-based systems, because directory updates may cause the write bandwidth to double or triple relative to read bandwidth, causing poor utilization of the available memory bandwidth. Especially in high-end server chipset architectures, situations may arise where a processor will issue two read transactions per write transaction; for example, when directory information for cache line ownership is maintained in the DIMMs, a typical memory read transaction turns into a read-modify-write transaction from the perspective of the DIMMs. In an illustrative example, two processor reads and one processor write can lead to additional directory updates, yielding two DRAM memory reads and three DRAM memory writes, and thus consuming 60% more memory bandwidth and power than the cache line accesses alone.