1. Field of the Invention
This invention relates generally to distributed shared memory multiprocessor computer systems and, more particularly, to cache coherency mechanisms for such systems.
2. Background Information
Distributed shared memory computer systems, such as symmetric multiprocessor (SMP) systems, support high-performance application processing. Conventional SMP systems include a plurality of processors coupled together by a bus. Recently, SMP systems have also coupled the processors together over a network. One characteristic of SMP systems is that memory space is shared among all of the processors, that is, each processor may access programs and data that are stored in the shared memory.
One or more operating systems are typically stored in the shared memory. The operating systems control the distribution of processes or threads among the various processors. The operating system kernels may execute on any processor, and may even execute in parallel. Accordingly, many different processors may execute various processes or threads simultaneously, and the execution speed of a given application may be greatly increased.
A processor in an SMP system also typically controls at least one level of associated cache memory. When the processor utilizes data from the shared memory, the processor typically holds an image, or copy, of the data in the associated cache. The processor thus avoids the delays associated with having to go to the shared memory each time the processor requires access to the data. The cache memories of two or more processors may contain overlapping or identical copies of data. If one of the processors alters its copy of the data, the copies of the data in the caches of other processors become invalid. To prevent the processors from acting on invalid, i.e., inconsistent, data, the SMP systems utilize some type of cache coherency protocol.
The cache coherency protocol provides a mechanism to keep track of which processors have copies of particular data, and also to notify the processors that are holding the copies that a given processor is going to update, or modify, the data. When the affected processors receive notice of the impending update operation, the processors invalidate their copies of the data. Thereafter, when one of the affected processors requires the data for further processing, the processor must first obtain a copy of the valid data from the updating processor.
In a directory based system, a “home node” maintains for an associated region of memory a cache coherency directory that indicates which processors have copies of the data in their caches. One processor will be listed in the directory as the “owner” of the data. As owner, the processor holds a valid copy of the data and has control of the data. In order for another processor to update the data, the processor must first become the owner of the data. Accordingly, the non-owner processor contacts the home node as part of an update operation and the home node grants a change in the ownership of the data. The new owner then updates the data, after, as necessary, obtaining a valid copy of the data from the previous owner. The home node also notifies the other processors that have copies of the data that their copies are now invalid.
The operations of the system must be coordinated with the time it takes to notify the processors about impending update operations, that is, with the time it takes to send an invalidate message to the affected processors. Otherwise, one or more of the processors may end up using invalid copies of the data in their processing operations. As the number of processors included in the system increases, it becomes more and more difficult to provide the invalidate messages to the affected processors in a timely manner. Accordingly, system operations may be adversely affected.
One type of known cache coherency mechanism uses “presence bits” to indicate which processors have copies of the data. The mechanism includes in the cache coherency directory for each data entry a number of bits that correspond to the respective processors in the system. For a given entry, the system sets the bits that correspond to the processors that have copies of the data.
When the data are to be updated, the system uses the associated presence bits to enter one or more network routing tables in order to multicast an invalidate message from the home node to each of the indicated processors. When an intermediate switch receives the message, the switch consults stored routing tables and forwards the message along designated routes leading to the affected processors that are accessible from the switch. The message is thus delayed at every switch, and the associated delays become longer as the numbers of processors and/or switches increase.
Larger systems may be organized in multiple-processor groups. In one such prior known system the processors in a given group communicate over the network through a group switch. In this system, the cache coherency mechanism uses “sectored presence bits” that correspond to the respective groups. The system sets a bit in a given entry to indicate that one or more processors in the corresponding group have copies of the data.
As part of an update operation, the home node uses the sectored presence bits to enter the associated routing tables and multicasts the invalidate messages along routes to the corresponding group switches. Intermediate switches similarly consult routing tables to forward the message along the routes. A node in each group may maintain a group cache coherency directory and use the information therein to direct the invalidate message to the individual affected processors. Alternatively, the group switch may locally broadcast the message to all of the processors in the group.
The sectored presence bits work well for systems with relatively small numbers of groups, with the various switches routing messages to the relatively few group switches. However, as the number of groups increases, the mechanism suffers from the same problems discussed above with reference to the use of the presence bits.