1. Field of the Invention
The invention relates to a multiprocessor system that includes plural processors and a shared memory shared by the plural processors.
2. Description of the Related Art
Although the processor has made rapid progress in its processing speed accompanied with the recent technological improvement regarding the processor, a memory or bus has made slow improvement in its operation speed compared with the performance improvement of the processor, which leads to the problem of a data transfer rate between the processor and the memory, thus causing influence over the performance of the whole computer system. On the other hand, along with increased demand for a higher speed and more functions in recent computer systems, a multiprocessor system has become essential as the system configuration. In the multiprocessor system, processes are executed in parallel with data communication between the processors. A technique that employs a shared memory referred by all the processors for the data communication makes the system configuration comparably simple, and it is widely used. A document titled: xe2x80x9cParallel Computersxe2x80x9d, written by Hideharu Amano in 1996, published by SHOKODO describes in detail the multiprocessor system that employs a shared memory. In this system, plural processors and shared memories, or other I/O devices are connected to a shared bus. And, the system executes parallel processing, while the plural processors or I/O devices appropriately read and write data in the shared memories. When taking on this type of shared bus configuration, the transfer band width of the bus, or the bus traffic congestion that occurs in the system, or the latency time for a memory access can influence the throughput of the system.
As a method of solving the bottleneck of the bus, the multiprocessor system described in the Japanese Published Unexamined Patent Application No. Hei 3-176754, shown in FIG. 21, can be given. In the multiprocessor system shown in this document, the shared memory is divided into memory modules with different and successive address areas assigned, and plural busses connected to all the processors and any one of the shared memory modules are provided to decentralize access demands from the processors, thereby reducing the bus contention.
Also, as a method of solving the bottleneck of memory access, there is widely used a method which adds high-speed local caches each to processors, and processes as many memory accesses as possible locally between the processors and the caches to thereby reduce the use of the shared bus. With the system thus configured, the probability of needing to access the shared memory that usually takes a long access time will significantly be reduced, and the average latency time of memory access will be improved.
Furthermore, as disclosed in the Japanese Published Unexamined Patent Application No. Hei 8-339353, shown in FIG. 22, a method is proposed which has plural busses of one type to expand the transfer band width of the bus, and accesses the shared memory through plural buffers. This method temporarily writes data in a vacant buffer when writing the data in a memory that takes a long access time, whereby the processor can move on to a next process. Normally, the processor is in a wait state before starting the data write in the memory. But by utilizing plural buffer areas, a system that will not be restricted by the slow processing speed of a memory can be configured.
The buffer given here does not have the same role as the cache memory mentioned in the previous example. By what is written in this published application, the buffer is the place that temporarily stores data to be read and written in the shared memory, and is meant to connect the processors and the memory that have different processing speed. Therefore, the data written in a buffer is used for transmitting from the memory to the processor when the processor demands read, and from the processor to the memory when the processor demands write. In other words, these buffers do not play the role of directly responding to the read demands from the processors as cache memories that store the copies of data in the shared memory. On the other hand, the cache memories differ in that they store copies of data in the shared memory, and are frequently read and written by the processors. Generally, the cache memory often adopts a memory with an access time that matches the speed of the processor.
However, either method has problems that will be mentioned hereafter. As in the example shown in the Japanese Published Unexamined Patent Application No. Hei 3-176754 (refer to FIG. 21), with the method of having the plural busses, the area to mount the busses will increase as the number of the busses increases. The influence that the number of the pins of ICs connected to the busses causes over the operation speed is enormous, and there is a tendency that the operation speed slows down as the number of the pins increases, or the mounting or the designing process becomes troublesome. Also, when the operation speed of the busses becomes higher, the EMC noise caused by the electromagnetic radiation and the transmission delay cannot be ignored. Thus a new problem is caused.
Further, there is a problem caused by adding local caches. The problem will be explained with reference to FIG. 23. FIG. 23 shows a configuration of four processor units 101a through 101d connected to a shared bus 103, which enables the processors to access the shared memory 104. The processor units 101a through 101d each have local caches 102a through 102d. Now, suppose that the two processors 101a and 101b read data of a same address in the shared memory 104. The read data pieces are each copied to the caches 102a and 102b of the processors 101a and 101b. Next, suppose that the processor 101a rewrites the cache data with a certain calculation. That is, this represents that the original data of that address in the shared memory 104 has been rewritten. When this happens, the data that the processor 101b has read and stored in the cache 102b is no longer the correct data. Therefore, in order for the processor 101b to reread the data of the same address, the processor 101b has to read the data again from the shared memory 104. In other words, when the processor 101a has rewritten the cache data, this information must be posted to the other caches, and the other caches have to undergo the procedure of deleting the data. Various protocols for the maintenance of coherency of this type of the caches are proposed. In relation to this subject, the document titled: xe2x80x9cAn Implementation of a Shared Memory Multiprocesserxe2x80x9d,(written by Norihisa Suzuki, Shigenori Shimizu, and Nagatugu Yamauchi, in 1993, published by CORONA PUBLISHING CO., LTD.) can be given as an example with detailed description. In general, either the directory method, which holds the table that records the status of the cache data and controls the correspondence by referring to that table, or the snoop cache method, which monitors all the memory accesses that go through the shared bus and controls the local caches as needed, is employed. However, not only do these methods require very complicated controls but they involve a problem of expanding the mounting area of the hardware.
Furthermore, when having plural data busses for solving the bottleneck of the bus, the process becomes more complicated. In both the directory method and the snoop cache method, all the memory accesses must be monitored in order to precisely understand the status of the cache. When there is only one path for the data, it is only needed that the memory access information running through the path is monitored, but when there are several of them, all of them have to be monitored and the consistency between the memory accesses must be maintained. When there is only one bus, other processors cannot make access to the same address simultaneously because the bus is occupied, but when there are several busses, it is conceivable to use different busses to issue an access request to the same address. When a processor is going to replace data in the shared memory being a result of the calculation processing, the processor has to check that the data corresponding to the address is not being read or written before it obtains a bus and starts the operation of rewriting the memory. Thus, when there are several busses, the consistency of the memory accesses has to be maintained and the synchronous processing of the caches has to be executed, which leads to making the cache control more complicated.
In the Japanese Published Unexamined Patent Application No. Hei 8-339353 (FIG. 22), the buffer memories are placed between the shared memory and the busses, but they are not equipped with a cache function that reads one and the same data repeatedly. Even if the buffer control section is equipped with a cache function, all the memory accesses will go through the busses in any case. Therefore, even if a memory with a short access time is used for the cache, there will be a problem of the operation speed of the busses that restricts to make the access time long. Furthermore, if the conventional electric wiring is used for the busses, the problem of the hardware design such as an expansion of the bus mounting area, or the problem of the EMC noise or the problem transmission delay cannot be ignored. Therefore, the limit of the busses configured by the conventional electric wiring cannot be avoided.
The present invention has been made in view of the above circumstances, and provides a multiprocessor system including plural processors and a shared memory, which simplifies the cache control, reduces the area size of the hardware, and shortens the memory access time.
The multiprocessor system according to one aspect of the present invention includes a shared memory, a cache memory connected thereto, an optical cross coupling network to which the cache memory is connected, and plural processors connected to the optical cross coupling network, which access the cache memory through the optical cross coupling network.
The multiprocessor system according to another aspect of the present invention may include plural cache memories, however in that case, these plural cache memories need to store copies of data corresponding to individually different addresses, in the shared memory.
Further, in the multiprocessor system according to another aspect of the present invention, the optical cross coupling network is preferably capable of simultaneously transmitting plural types of light signals, and also the optical cross coupling network is preferably capable of broadcasting signals transmitted by the cache memory to the plural processors.
Furthermore, in the multiprocessor system according to another aspect of the present invention, preferably the cache memory is connected to the optical cross coupling network by plural ports, and in that case, it is still more preferable that the optical cross coupling network is capable of simultaneously transmitting plural types of light signals, and the cache memory is connected to the optical cross coupling network by the same number of ports as the number of types of the light signals that the optical cross coupling network is capable of simultaneously transmitting.