1. Field of the Invention
The present invention relates to multiprocessor devices which process information while communicating information between processors via a main memory, and particularly to a multiprocessor device as well as a cache device, a consistency control device and a protocol conversion device used in the multiprocessor device adopting a weak memory consistency model.
2. Description of the Background Art
A request for improvement in the processor performance is recently increasing in various fields such as multimedia processing and high-resolution image processing. The currently available LSI (Large Scale Integration) manufacturing technique, however, has a limit to enhancement of the device speed. A multiprocessor device according to the distributed processing system is now attracting attention and being actively studied and developed.
A processor device having a single processor is often provided with a cache for storing data which is more likely to be referenced by the processor in order to respond speedily to the memory access by the processor. For example, a microprocessor employs a merged cache having a capacity of 8 K bytes to achieve improvement of the system performance. In such a microprocessor, the memory address space is divided into sections each having 16 bytes, and the 16-byte section is associated with a cache entry and individually managed by the cache. The divided memory section having a prescribed byte of 16 bytes, for example, is hereinafter referred to as a memory block. In a processor device employing a write-back cache, an update process by a store instruction of the processor is completed only by updating a copy of a memory block in the cache. The updated copy of the memory block within the cache is written back to a main memory by an instruction from the processor or a replacement process caused by a command capacity of the cache. Compared with a processor device employing a write-through cache which directly updates a main memory each time a store instruction from the processor is executed, the performance is generally improved. However, data in a memory block of the main memory has its content different from an updated copy of a corresponding memory block in the cache.
A multiprocessor device employing a plurality of processors also employs the cache. In such a multiprocessor device, two types of caches, that is, a cache belonging to each processor and a cache shared by the plurality of processors are employed. The cache specific to each processor and the cache shared by the plurality of processors are respectively referred to simply as a cache and as an auxiliary cache in the following description. The cache in the multiprocessor device makes a high speed response to the memory access, and further functions to reduce the traffic on an interconnection network that interconnects the processors and a main memory.
When the cache is employed, copies of the same memory block are present in a plurality of caches. Update of data in the cache by the processor causes inconsistency between data in the main memory and the copy in the cache, leading to a problem of so-called cache consistency. For a correct operation of the multiprocessor device, update of data in a cache by one processor should be correctly reflected on reference of the data by another processor. The state in which update of data by one processor is accurately reflected on reference by another processor is herein considered as a state in which the memory consistency is maintained. In addition, a model which defines a result obtained by a series of memory accesses by a plurality of processors and on which a program is described to maintain the memory consistency is herein referred as a memory consistency model.
There are a number of conventional methods for guaranteeing the memory consistency. According to one type of classification of the methods for guaranteeing the memory consistency, there are two methods, a method by invalidation and a method by update. According to the method by invalidation, when a copy of a memory block in a cache is updated by any processor, copies of other caches are discarded. After that, if a processor attempts to refer to an invalidated memory block from the cache, the cache having the updated copy provides the updated copy directly or via the main memory or the auxiliary cache to the processor. According to the method by update, if a copy of a memory block in any cache is updated, copies of memory blocks in other caches are also updated. Both methods allow the processor to refer to the content of a latest memory block by reading the memory block from the cache.
According to another type of classification, one method is based on snooping mechanism and the other method is based on directory mechanism. The snooping mechanism is widely used in a bus-connected multiprocessor device. In the snooping mechanism, when a cache makes a request for updating of a memory block or reading of a memory block from the main memory, the request is broadcasted via a bus. Other caches monitor the request and perform write back, invalidation or update of a copy of the memory block as necessary. In the directory mechanism, information about which cache has a copy of each memory block is managed and the copy is written back, invalidated, or updated as necessary.
FIG. 1 illustrates a structure of an entry provided to a cache in a multiprocessor device disclosed in Japanese Patent Laying-Open No. 5-61770. A write privilege flag in this entry is used for management of an exclusive write privilege (right to write), and control is made for each memory block such that more than one cache entries in which the write privilege flag is set are not present in the multiprocessor device. The processor cannot update a copy of a memory block unless the write privilege flag is set in a cache entry within the cache. In this multiprocessor device, before the processor updates a copy of a memory block in the cache, an exclusive write privilege to the memory block is obtained and a copy of the corresponding memory block stored in other caches having no exclusive write privilege to that memory block is invalidated. As a result, it is guaranteed that the updated copy of the memory block is present in only one cache having the exclusive write privilege to ensure the memory consistency.
The size of the data to which the processor makes access is smaller than the size of the memory block in most cases. Therefore, the state of false sharing in which different processors make access to different data in the same memory block occurs. In the case of the multiprocessor device described above, if the false sharing occurs, a process for guaranteeing the consistency is performed for each memory block even if different data are accessed.
"Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors" (pp. 15-26, 17th Annual International Symposium on Computer Architecture) pays attention to the flow of a program to show that a strong memory consistency model is not necessarily required for guaranteeing the memory consistency for each memory access by a processor, and proposes a multiprocessor device employing a weak memory consistency model. However, the conventional multiprocessor device described above employs a strong memory consistency model for guaranteeing the memory consistency for each memory access.
If the false sharing occurs in the conventional multiprocessor device above, the exclusive write privilege to a memory block containing data to be updated is migrated to cause invalidation of data irrelevant to the data to be updated due to the migration of the write privilege. The unnecessary invalidation of data increases an average access time of the memory. The conventional multiprocessor device thus has a problem of decrease in process performance due to the false sharing.
The multiprocessor device maintaining the consistency by updating a copy of a memory block which is stored in other caches also has a problem of decrease in process performance due to a number of messages for updating a memory block in each cache upon occurrence of the false sharing. These problems arise in both of the multiprocessor devices respectively employing the snooping mechanism and the directory mechanism when the exclusive write privilege to the memory block is utilized for guaranteeing the memory consistency.
In addition, in the multiprocessor device described above, the program is made on the basis of the weak memory consistency model. Therefore, a process for guaranteeing the consistency is executed for each memory access even if absence of the consistency causes no problem. As a result, excessive messages are generated to decrease the process performance similarly to the case of the false sharing.