The present invention relates generally to computer processor technology. In particular, the present invention relates to cache coherency for a shared memory multiprocessor system.
A state of the art microprocessor architecture may have one or more caches for storing data and instructions local to the microprocessor. A cache may be disposed on the processor chip itself or may reside external to the processor chip and be connected to the microprocessor by a local bus permitting exchange of address, control, and data information. By storing frequently accessed instructions and data in a cache, a microprocessor has faster access to these instructions and data, resulting in faster throughput.
Conventional microprocessor-cache architectures were developed for use in computer systems having a single computer processor. Consequently, conventional microprocessor-cache architectures are inflexible in multiprocessor systems in that they do not contain circuitry or system interfaces which would enable easy integration into a multiprocessor system while ensuring cache coherency.
A popular multiprocessor computer architecture consists of a plurality of processors sharing a common memory, with each processor having its own local cache. In such a multiprocessor system, a cache coherency protocol is required to assure the accuracy of data among the local caches of the respective processors and main memory. For example, if two processors are currently storing the same data block in their respective caches, then writing to that data block by one processor may effect the validity of that data block stored in the cache of the other processor, as well as the block stored in main memory. One possible protocol for solving this problem would be for the system to immediately update all copies of that block in cache, as well as the main memory, upon writing to one block. Another possible protocol would be to detect where all the other cache copies of a block are stored and mark them invalid upon writing to one of the corresponding data block stored in the cache of a particular processor. Which protocol a designer actually uses has implications relating to the efficiency of the multiprocessor system as well as the complexity of logic needed to implement the multiprocessor system. The first protocol requires significant bus bandwidth to update the data of all the caches, but the memory would always be current. The second protocol would require less bus bandwidth since only a single bit is required to invalidated appropriate data blocks. A cache coherency protocol can range from simple, (e.g., write-through protocol), to complex, (e.g., a directory cache protocol). In choosing a cache coherence protocol for a multiprocessor computer system, the system designer must perform the difficult exercise of trading off many factors which effect efficiency, simplicity and speed. Hence, it would be desirable to provide a system designer with a microprocessor-cache architecture having uniquely flexible tools facilitating development of cache coherence protocols in multiprocessor computer systems.
A present day designer who wishes to construct a multiprocessor system using a conventional microprocessor as a component must deal with the inflexibility of current microprocessor technology. Present day microprocessors were built with specific cache protocols in mind and provide minimal flexibility to the external system designer. For example, one common problem is that a cache of a microprocessor is designed so that a movement of a data block out of a cache automatically sets the cache state for the block to a predetermined state. This does not give a designer of a multiprocessor system the flexibility to set the cache to any state in order to implement a desired cache protocol. Because of this significant complexity is necessarily added to the design of a cache protocol.
In accordance with the present invention, a memory management system couples a plurality of processors to each other and to a main memory. Each processor may have one or more associated caches local to that processor. A system port of the memory management system receives a request from a first processor of the processors to access a block of data from the main memory. A memory manager of the memory management system then converts the request into a probe command having a data movement part identifying a condition for movement of the block out of a cache of a second processor and a next coherence state part indicating a next state of the block in the cache of the second processor.
In another aspect of the present invention, the memory manager is further configured to present the probe command over the system port to the second processor so that the second processor changes a state of the block in the cache of the second processor in accordance with the next coherence state part of the probe command.
In yet another aspect of the present invention, the state of the block is changed in accordance with the next coherence state part of the probe command by setting the state of the data in the cache to a clean/shared state indicating there is at least one more copy of the data in a cache of another processor and the data in the cache is clean.
In yet another aspect of the present invention, the state of the data is changed in accordance the next coherence state part of the probe command by setting the state of the data in the cache to invalid.
In yet another aspect of the present invention, the state of the data is changed in accordance with the next coherence state to part of the probe command by setting the state of the data in the cache so as to transition to a next state conditioned on the current state of the data.
In yet another aspect of the present invention, the state of the cache is changed in accordance with the next coherence state part of the probe command by setting the state of the data in the cache so that if the current state of the data is clean then the next state of the data is clean/shared, if the current state of the data is dirty then the next state of the data is invalid, and if the current state of the data is dirty/shared then the next state of the data is clean/shared.
In yet another aspect of the present invention, the state of the data is changed in accordance with the next coherence state part of the probe command by setting the state of the data in the cache so that if the state of the data is clean then the state of the data changes to clean/shared, and if the state of the data is dirty then the state of the data transitions to dirty/shared.
In yet another aspect of the present invention, the state of the data is changed in accordance with the next coherence state part of the probe command by setting the state of the block of data in the cache to a clean state to indicate that the cache has an exclusive copy of the data outside of the main memory.
In another aspect of the present invention, the memory manager is further configured to receive the block from the cache of the second processor in accordance with the data movement part of the probe command.
In yet another aspect of the present invention, the memory manager receives the block of data in accordance with the data movement part of the probe command only if the data is located in the cache and the state of the data is valid.
In yet another aspect of the present invention, the memory manager receives the block of data in accordance with the data movement part of the probe command only if the state of the block of data is dirty.
In yet another aspect of the present invention, the state of the data is changed in accordance with the next coherence state part of the probe command by setting the state of the block of data in the cache to a clean state to indicate that the cache has an exclusive copy of the data outside of the main memory.
According to another aspect of the present invention, the memory manager is further configured to send a system data control command over the system port to the first processor.
According to another aspect of the present invention, the system data control command includes a system data control part indicating that the data is for filling the cache of the first processor. The first processor is configured to fill the cache of the first processor with the data at the address according to the system data control part of the system data control command.
According to another aspect of the present invention, the first processor is further configured to change the state of the data in accordance with a next coherence state part of the system data control command by setting the state of the data to the clean state.
According to another aspect of the present invention, the first processor is further configured to change the state of the data in accordance with a next coherence state part of the system data control command by setting the state of the data to the clean/shared state.
According to another aspect of the present invention, the first processor is further configured to change the state of the data in accordance with a next coherence state part of the system data control command by setting the state of the data to the dirty state.
According to another aspect of the present invention, the first processor is further configured to change the state of the data in accordance with a next coherence state part of the system data control command by setting the state of the data to the dirty/shared state.
In accordance with the present invention, in a process for managing cache coherency, a request is received from a first processor of the plurality of processors to access a block of data from the main memory. The request is converted into a probe command having a data movement part identifying a condition for movement of the block out of a cache of a second processor of the plurality of processors and a next coherence state part indicating a next state of the block in the cache.
In a further aspect, a system data control response command is generated. The system data control response command is presented to the first processor along with the block of data from the second processor to fill a cache associated with the first processor with the block of data and to change the state of the cache block in the cache of the first processor according to a next coherence state part of the system data control response command.
In a further aspect, the memory manager is further configured to present the probe command over the system port to the second processor so that the second processor changes a state of the block in the cache of the second processor in accordance with the next coherence state part of the probe command.
Objects, advantages, novel features of the present invention will become apparent to those skilled in the art from this disclosure, including the following detailed description, as well as by practice of the invention. While the invention is described below with reference to a preferred embodiment(s), it should be understood that the invention is not limited thereto. Those of ordinary skill in the art having access to the teachings herein will recognize additional implementations, modifications, and embodiments, as well as other fields of use, which are within the scope of the invention as disclosed and claimed herein and with respect to which the invention could be of significant utility.