The present invention relates generally to memories in computer systems. More particularly, the invention relates to a snoop mechanism for use in a bus-based cache coherent multiprocessor system.
Modem computer systems utilize various technologies and architectural features to achieve high performance operation. One such technology employs several central processing units (CPUS) or processors arranged in a multiprocessor configuration. In addition to the processor units such a multiprocessor system can include several I/O modules and a memory module, all coupled to one another by a system bus. The capabilities of the overall system can be enhanced by providing a cache memory at each of the processor units in the multiprocessor system.
Cache memories are used to improve system performance. A cache memory is a relatively small, fast memory which resides between a central processor and main system memory. Whenever the processor reads data in the cache memory, the time required to access the main system memory is avoided. Thus, a cache memory improves system performance.
In a multiprocessor system, the capability of the system can be enhanced by sharing memory among the various processors in the system and by operating the system bus in accordance with a snooping bus protocol. The snooping bus protocol is used to maintain cache coherency. In shared memory multiprocessor systems, it is necessary that the system store a single correct copy of data being processed by the various processors of the system. Thus, when a processor writes to a particular data item stored in its cache, that copy of the data item becomes the latest correct value for the data item. The corresponding data item stored in main memory, as well as copies of the data item stored in other caches in the system, becomes outdated or invalid.
The snooping protocol provides the necessary cache coherency between the various cache memories and main memory. In accordance with the snooping protocol, each processor monitors or snoops the bus for bus activity involving addresses of data items that are currently stored in the processor""s cache memory. Status bits are maintained in a tag memory associated with each cache to indicate the status of each data item currently stored in the cache. A processor looks up each address in its cache memory, determines if it is there, and determines the appropriate action to take in response to the snooped command and address. The cache coherency protocol dictates the manner in which the state of the cache line is maintained in each cache memory and which processor provides the requested data.
One such cache coherency protocol is the MOSEI protocol which is associated with the following five cache tag states:
Exclusive Modified (M): the data block stored in the cache line corresponding to this tag has been modified by the data processor coupled to the cache and is not stored in any other cache memories.
Shared Modified (O): the data block stored in the cache line corresponding to this tag has been modified by the data processor coupled to this cache and may be stored in one or more other cache memories.
Exclusive Clean (E): the data block stored in the cache line corresponding to this tag has not been modified by the data processor coupled to this cache and is not stored in any other cache memories.
Shared Clean (S): the data block stored in the cache line corresponding to this tag has not been modified by the processor coupled to the cache, and the cache line can be stored in one or more other cache memories.
Invalid (I): the cache index and cache line contents are invalid.
FIG. 1 illustrates a prior art shared memory cache coherent multiprocessor system 100 using snoop-in and snoop-out logic to service snooped requests in accordance with a MOSEI cache coherency protocol. There is shown one or more processor modules or nodes 102A-102N connected to a bus interconnect structure 104 operated in accordance with a snoop protocol. Each node 102 includes a processor 106, a cache memory 108, a main memory unit 110, a bus watcher 112 as well as other components not shown. The main memory unit 110 stores data that is local to the node 102 and data that is shared in one or more of the nodes 102. This data can also be resident in the cache memory 108. Each node 102 is associated with a specific address range of the shared memory.
The cache memory 108 is coupled to the main memory unit 110 and the bus watcher 112. The cache memory 108 attempts to service the memory requests received from the processor 106 from either the cache memory 108 or the main memory unit 110. In the event the requested data cannot be obtained by the memories associated with the processor, the request is broadcasted on the snoop bus 104.
The bus watcher 112 is used to monitor the memory requests broadcasted on the snoop bus 104 which pertain to the data resident in the node""s cache memory 108 or associated with the node 102. When a read miss transaction is snooped from the snoop bus 104, the bus watcher 112 transmits the request to the cache memory 108. If the requested data item is stored in the cache memory 108, the state associated with the cache line is returned as will be described below. If the requested data item is associated with the processor""s shared memory address range, the data item is fetched from the main memory unit 110 and transmitted to the requesting node 102.
This particular multiprocessor system 100 uses snoop-in and snoop-out logic to implement the snoop protocol. The snoop-in logic includes a set of shared-in and owned-in signals and associated logic components and the snoop-out logic includes a set of shared-out and owned-out signals and associated logic components. Each processor""s cache memory 108 receives a shared_in and owned_in signal and transmits a shared_out and owned_out signal. The shared_out signal is used to indicate whether the data associated with a snooped address is stored by the processor in the shared state (i.e., stored in the xe2x80x98Exe2x80x99 or xe2x80x98Sxe2x80x99 state in accordance with the MOSEI protocol) and when the snooped command is a read miss. The owned_out signal is used by each processor to indicate whether the data associated a snooped address is owned by the processor and may have been modified by the processor (i.e., stored in the xe2x80x98Mxe2x80x99 or xe2x80x98Oxe2x80x99 state in accordance with the MOSEI protocol).
The shared_in signal is used to indicate whether the snooped cache line is shared. The processor initiating the snooped request uses this signal to store the requested cache line in either the xe2x80x98Exe2x80x99 or xe2x80x98Sxe2x80x99 state. The cache line is stored in the xe2x80x98Exe2x80x99 state when the signal indicates that the cache line is not shared by another processor and the cache line is stored in the xe2x80x98Sxe2x80x99 state when the signal indicates that the cache line is shared by one or more processors. The owned_in signal is used to indicate that the processor is to provide the requested cache line.
At each snoop cycle, the bus watcher 112 snoops a requested address and command from the bus 104. The shared_out signals from each processor is set accordingly and transmitted to a first OR logic unit 116 which generates the corresponding shared_in signals, as shown in FIG. 2A. The owned_out signals from each processor are also set and transmitted to a second OR logic unit 118 which generates the corresponding owned_in signals, as shown in FIG. 2B. A processor asserting its shared_out signal provides the requested data and alters the state of the cache line in accordance with the MOSEI protocol. The processor receiving a shared_in signal that is asserted stores the requested data in the corresponding MOSEI state.
At the same time that each processor 106 snoops a requested address and command from the bus 104, the main memory unit 110 in the processor 106 associated with the address is accessed. In the case of a read miss, the requested data is fetched from the associated main memory unit 110 and transmitted to the initiating processor 106. The main memory access is performed in parallel with the cache lookup by each processor 106 in order to reduce the memory latency when the requested data in not found in any processor 106. In the event the fetched memory data is not needed, the initiating processor 106 disregards the data.
The use of the snoop-in and snoop-out logic to implement the snoop protocol has several drawbacks. First, such logic requires that each processor generates the snoop result (i.e., set snoop-out signals) during the same cycle. This timing constraint is complicated by the fixed latency time incurred between receiving the bus request and in generating the snoop result. In addition, the snoop-in and snoop-out logic serializes the snoop results which limits the system throughput. Furthermore, the practice of transmitting the requested data from main memory needlessly increases the data bus bandwidth.
Accordingly, there exists a need for a snoop mechanism that can overcome these shortcomings.
The present invention is a method and apparatus that implements a snoop protocol in a multiprocessor system without the use of snoop-in and snoop-out logic. The multiprocessor system includes a number of nodes connected by a bus operated in accordance with a snoop protocol. Each node includes a cache memory and a main memory unit including a shared memory region that is distributed in one or more of the cache memories of the nodes in the system. Each node is associated with a specified portion of the shared memory. The snoop protocol maintains the shared memory in a consistent state. In a preferred embodiment, the system utilizes the MOSI cache coherency protocol.
The snoop bus is used by an initiator node to broadcast a read miss transaction. In this case, each node searches its cache memory for the requested address. If the address is found in the cache memory and is associated with a xe2x80x98Mxe2x80x99 or xe2x80x98Oxe2x80x99 state, the data block in the cache memory is returned to the initiator node. In addition, the node storing the data block in its main memory unit fetches the data from main memory. However, the fetched data block is not returned to the initiator node if the requested data was modified by another node.
Each node includes a memory access unit having an export cache that stores identifiers associated with data blocks that have been modified in another processor""s cache. Each data block in the main memory unit is associated with a state bit that indicates whether the data block is valid or invalid. A data block is valid if it has not been modified by another node and a data block is invalid if it has been modified by another node. The export cache and the state of each memory data block is used to determine whether a node should transmit a fetched data block to an initiator node in response to a read miss transaction. In this manner, the bus traffic is reduced since only the valid copy of the requested data item is transmitted to the initiator node.