In a system composed of a plurality of data processing devices and a common memory, when the data processing devices include cache memories, it is necessary to have coherency in the information written in the memories. In other words, when information is rewritten by a certain processing device, the other processing devices can use the new information.
A general method for realizing coherency is, as disclosed in Japanese Patent Publication No. 49-12020 (1974), hereinafter referred to as Reference 1. In Reference 1 writing is always carried out on a common memory. Simultaneously the fact of the rewrite is informed to cache memories of the other processing devices, and if there are corresponding addresses in the cache memories these addresses are invalidated. Such method is called as writethrough method.
In the writethrough method, all of the write accesses are input to a memory bus, thereby when performance speed of the processor is enhanced, the memory bus becomes a bottleneck. In particular, in case of a multi processor in which a plurality of processors are connected to one memory bus using the writethrough method, the above bottleneck is fatal because of the performance of the memory bus. For this reason, a copyback cache method has become broadly employed these days in which even if the cache memory is rewritten, the main memory is not rewritten until necessary. The copyback cache method is also assembled into the latest microcomputer chip as disclosed in "32 bits MPU with built-in bus snoop function suitable for multi processor" [NIKKEI ELECTRONICS, 1989, 7, 24, (No. 478), pp 173-179], hereinafter referred to as Reference 2, and in "68040 achieved to 13.5 MIPS by Harvard architecture and optimized command" [NIKKEI ELECTRONICS, 1989, 6, 26, (No. 476), pp 131-140], hereinafter referred to as Reference 3. Also, the copyback cache method is explained in U.S. Pat. No. 4,928,225 "Coherent cache structures and methods", hereinafter referred to as Reference 5.
On one hand, there is a system employing a plurality of data processing devices and a dual common memory for high reliability as disclosed in Japanese Patent Application Laid-Open No. 58-16362 (1983), hereinafter referred to as Reference 4. For a data processing device in such system, it is preferred that the above described copyback cache method to be used. However, different from the case of the writethrough method, the case of the copyback cache method is difficult to maintain coherency. In the copyback cache method, a control flag has to be provided at every entry of caches as disclosed in Reference 2, and further for separating EU (exclusive unmodified) from SU (shared unmodified) at the control flag, the respective cache memories have to watch all of read accesses. In case of a single memory bus such watching is easily realized, however in the system which is constituted by a plurality of components as disclosed in Reference 4, copyback caches of the respective data processing devices have to perform watching of all the accesses to a common memory between data processing devices so that the data transfer ability of the interface between components (in case of Reference 4, the interface cable between the CPU and the common memory) becomes a bottleneck thus creating. Therefore, implementation of the copyback cache method is difficult (in the single memory bus of back plane type performance of about 150M byte/sec is easily achieved however, in case of the interface cable of several meter long performance of at most 20M byte/sec is achieved provided that the technical measures of the same level are employed).
The most simple method for solving the problems is a method in which, when the address to be accessed is in a common memory between data processing devices, a copyback cache is bypassed and the common memory is directly accessed. However, with this method, the copyback cache is not utilized, therefore, when a large amount of data is desired to be transferred to the common memory, for example a transference between a file and the common memory, the load of the memory bus in the data processing device increases, and the performance of the system decreases. One effect of this disadvantage is for example that data which would have been transferred by one transference of a 64 byte block if the cache were usable, requires instead that the data be transferred by 16 transferences of 4 byte. Thus, the load of memory bus extremely increases substantially.