The present invention relates generally to computer processor technology. In particular, the present invention relates to cache coherency for a shared memory multiprocessor system.
A state of the art microprocessor architecture may have one or more caches for storing data and instructions local to the microprocessor. A cache may be disposed on the processor chip itself or may reside external to the processor chip and be connected to the microprocessor by a local bus permitting exchange of address, control, and data information. By storing frequently accessed instructions and data in a cache, a microprocessor has faster access to these instructions and data, resulting in faster throughput.
Conventional microprocessor-cache architectures were developed for use in computer systems having a single computer processor. Consequently, conventional microprocessor-cache architectures are inflexible in multiprocessor systems in that they do not contain circuitry or system interfaces which would enable easy integration into a multiprocessor system while ensuring cache coherency.
A popular multiprocessor computer architecture consists of a plurality of processors sharing a common memory, with each processor having its own local cache. In such a multiprocessor system, a cache coherency protocol is required to assure the accuracy of data among the local caches of the respective processors and main memory. For example, if two processors are currently storing the same data block in their respective caches, then writing to that data block by one processor may effect the validity of that data block stored in the cache of the other processor, as well as the block stored in main memory. One possible protocol for solving this problem would be for the system to immediately update all copies of that block in cache, as well as the main memory, upon writing to one block. Another possible protocol would be to detect where all the other cache copies of a block are stored and mark them invalid upon writing to one of the corresponding data block stored in the cache of a particular processor. Which protocol a designer actually uses has implications relating to the efficiency of the multiprocessor system as well as the complexity of logic needed to implement the multiprocessor system. The first protocol requires significant bus bandwidth to update the data of all the caches, but the memory would always be current. The second protocol would require less bus bandwidth since only a single bit is required to invalidated appropriate data blocks. A cache coherency protocol can range from simple, (e.g., write-through protocol), to complex, (e.g., a directory cache protocol). In choosing a cache coherence protocol for a multiprocessor computer system, the system designer must perform the difficult exercise of trading off many factors which effect efficiency, simplicity and speed. Hence, it would be desirable to provide a system designer with a microprocessor-cache architecture having uniquely flexible tools facilitating development of cache coherence protocols in multiprocessor computer systems.
A present day designer who wishes to construct a multiprocessor system using a conventional microprocessor as a component must deal with the inflexibility of current microprocessor technology. Present day microprocessors were built with specific cache protocols in mind and provide minimal flexibility to the external system designer. For example, one common problem is that a cache of a microprocessor is designed so that a movement of a data block out of a cache automatically sets the cache state for the block to a predetermined state. This does not give a designer of a multiprocessor system the flexibility to set the cache to any state in order to implement a desired cache protocol. Because of this significant complexity is necessarily added to the design of a cache protocol.
In accordance with the present invention, a computing apparatus includes a cache, an evictor unit, and a signaling unit. The cache includes a plurality of blocks. The evictor unit selects a first block of data from the plurality of blocks to be removed from the cache. The first block of data has an unmodified coherence state. The signaling unit transmits a notify signal indicating the removal of the first block from the cache. Typically, the unmodified coherence state of the first block in the cache is a clean coherence state and the notify signal is a clean victim signal. The coherence state may include additional attributes, for example, a clean state may be clean/shared, and a dirty state may be dirty/shared.
Typically, the evictor unit and the signaling unit are included in an external unit of a processor, the external unit providing an interface for communication of information to an external system. An external system may include a memory management system or a memory controller.
In a further aspect of the present invention, the processor may have one or more caches and the cache may be an L1 cache or an L2 cache.
In yet a further aspect of the present invention, the evictor unit is further configured to select a second block from the plurality of blocks. The second block has a modified coherence state. The signaling unit is further configured to transmit another notify signal indicating removal of the second block from the cache. Typically, the modified coherence state of the second block in the cache is a dirty coherence state and the another notify signal is a write victim signal.
In yet another aspect, the computing apparatus includes an address bus and the evictor unit is further configured to transmit an address associated with the first block and an address associated with the second block onto the address bus.
In a further aspect, the computing apparatus includes a data bus and the evictor unit is further configured to transmit data associated with the first block and data associated with the second block onto the data bus.
In yet a further aspect, the transmitted address, the transmitted data, and the transmitted notify signals are received by an external system supporting a cache coherence protocol for the plurality of processors.
In accordance with still another aspect of the present invention, the computing apparatus includes a buffer, typically a victim buffer, storing the data associated with the second block. The evictor unit is further configured, after selecting the second block, to transmit the address associated with the second block onto the address bus to an external system and to transmit a location of the buffer to the external system. The buffer is programmably configurable so that the external system can independently control at least one of pulling of the data associated with the second block from the buffer for transmission over the data bus to the external system and releasing of the data associated with the second block from the buffer.
In further accordance with the present invention, a method supports cache coherence protocols by presenting a clean victim signal to an external system. In this method, a block is selected to evict from one of a plurality of caches. The evicted block is removed from the one of the plurality of caches. The evicted block has either a modified cache state or an unmodified cache state. The address of the evicted block is transmitted to an external system which maintains cache coherency for the plurality of caches according to a cache protocol. It is then communicated to the external system that the evicted block is modified, if a modified block was evicted, or unmodified, if an unmodified block was evicted.
Preferably, a modified cache state may be one of dirty and dirty/shared. Preferably, an unmodified cache state may be one of clean and clean shared.
In a further aspect of the present invention, the evicted block has a modified cache state and the data of the evicted block is independently pullable and releasable by the external system.
Objects, advantages, novel features of the present invention will become apparent to those skilled in the art from this disclosure, including the following detailed description, as well as by practice of the invention. While the invention is described below with reference to a preferred embodiment(s), it should be understood that the invention is not limited thereto. Those of ordinary skill in the art having access to the teachings herein will recognize additional implementations, modifications, and embodiments, as well as other fields of use, which are within the scope of the invention as disclosed and claimed herein and with respect to which the invention could be of significant utility.