This invention relates to caching in multicore and multiprocessor computers.
Cache coherence is a useful mechanism in multiple processor systems to provide a shared memory abstraction to the programmer. When multiple processors cache a given shared memory location, a cache coherence problem may arise because a copy of the same memory location exists in multiple caches. A cache coherence protocol guarantees that a given memory location has a consistent view across all processors. There are many models of what a consistent view is, and one example is sequential consistency. Another is weak ordering. In each of these models, the coherence protocol prescribes a legal order in which memory locations can be acquired by and updated by various processors.
Directory based protocols are one way of maintaining cache coherence. In many previous systems, a directory is maintained alongside main memory. For example, the directory state and directory controller are both implemented as part of main memory, which is off-chip, and there is directory state associated with each memory line (same size as a cache line, which corresponds to the unit in which memory is cached, also called a cache block). Thus in such examples, the directory state is proportional in size to main memory size, Typically, the directory controller is also associated with the memory controller, and often both are tightly coupled to each other. In the MIT Alewife machine, a multiprocessor computer which was operational in 1994 (e.g., described by Anant Agarwal, Ricardo Bianchini, David Chaiken, Fred Chong, Kirk Johnson, David Kranz, John Kubiatowiez, Beng-Hong Lim, Ken Mackenzie, and Donald Yeung, “The MIT Alewife Machine: Architecture and Performance,” in Proceedings of the IEEE, March 1999, incorporated herein by reference), main memory was partitioned and each partition was associated with a corresponding one of the processor cores. As illustrated in FIG. 1, each of the nodes 100 in the machine corresponded to a core (e.g., core 1-core 3) that included a processor 102, its cache 104 (or caches, if there were multiple levels), and a portion 106 of main memory 108. The main memory was on a separate chip (e.g., typically implemented as DRAM). Each cache line sized memory line had a directory entry associated with it. The collection of directory entries for all memory lines in the memory portion 106 for a node was stored as a directory portion 110 for that node. A directory with portions from all the nodes was implemented typically in the same technology as the main memory itself (e.g., in DRAM) and each directory portion 110 was stored alongside the corresponding memory portion 106 in main memory 108. Thus, a given directory portion 110 and its associated processor 102 are on different chips.
Alewife distributed the directory along with the memory for all the nodes as depicted in FIG. 1. The directories stored the node numbers of the nodes on which a copy of a given memory line was stored as a cache line. This way, if some cache wanted to write a given line of data, then the directory would be queried and the cache lines storing that line of data could be invalidated in all the other caches.
Directory based cache coherence approaches are generally more scalable and hence may be preferable over other approaches, such as snoopy cache approaches, for large-scale architectures. Snoopy cache approaches generally need a bus to connect the various nodes in the system and each of the nodes broadcasts their requests over this medium. Each of the nodes also snoops on the bus and listens in on all the requests. Because of the need to broadcast and snoop on all main memory requests, the snooping cache schemes may not scale to more than a few cores in a multicore system.
The Alewife directory kept track of all or some of the locations (nodes) in which a given memory line was cached. The directory also kept some state which indicated the current state of the cache line, such as for example, whether it was dirty in a cache, and a number of outstanding acknowledgements for an issued transaction. The “directory state” includes the state information that tracks where cached copies are, and the status of various transactions. The “directory controller” includes the controller engines that implement the directory protocol. The term “directory” is generally used to refer to either or both the directory state and the directory controller depending on the context.
When a processor (or node) (for example, node 2) requested a cache line which was not present in its cache, node 2's cache would take a cache miss and the request would be sent directly to the main memory location corresponding to the cache line. A coherence controller or the directory controller attached to that memory would check the directory entry stored along with the memory location (in the directory state) and would determine the action to take based on the request and the current state of that memory location. In a common scenario, the controller would simply send the requested data from memory to the node requesting it. The controller would also update the corresponding directory state to reflect the fact that node 2 now has a readable copy of the line in its cache. FIG. 1 illustrates that this action took a local transaction (marked as 1) between a processor and cache, and two network transactions (marked as 2 and 3), one from the cache of node 2 to request the data from the memory portion 106 assigned to node 1, and one to receive the data from memory at the cache of node 2. These transactions are between a chip having the processor and cache of a node and a chip providing the main memory. The directory state stored in the directory portion 110 of node 1 includes pointers to nodes that store copies of the data from a given memory line.
In multiprocessors such as Alewife each of the nodes can be implemented on a single chip or multiple respective chips. The directory state was also large—its size was proportional to the size of main memory because each memory location stored a directory entry associated with it.