The present invention relates to a parallel processing system for exchanging information among a plurality of processing units and a memory LSI which may be used in such system.
One type of prior art shared memory system for use in a parallel processing system adopts a mode of operation in which one shared memory is connected to a shared bus system and is utilized by a plurality of processors in common. The shared bus system comprises a shared bus, an arbiter circuit for arbitrating requests for access to the shared bus made by units connected to the shared bus and for permitting such access, units for inputting/outputting data to/from the shared memory and the like as necessary.
There is also an advanced shared memory system as disclosed in Japanese Patent Laid-Open No. Hei. 5-290,000 in which shared memories (memory unit) are distributed to each processor in order to reduce the contention of accesses on the shared bus system. Such a shared memory is memory or distributed occasionally referred to as a local shared memory.
As a shared memory system for use in a parallel processing system comprising such a local shared memory, there is a broadcast type parallel processing system having a mode of operation in which the contents of local shared memories of other processors are also changed, when the contents of the local shared memory of one processor is changed, by broadcasting the changed contents. The above-mentioned system disclosed in Japanese Patent Laid-Open No. Hei. 5-290,000 also belongs to this broadcast type of system.
Each system described above has had the following problems. That is, in a parallel processing system of the type having one shared memory connected to a shared bus system, there has been a possibility that a large amount of read and write cycles from a plurality of processors contend in the shared bus system in a complex manner. Arbitrating this access contention has caused a waste of time on the side of the shared bus system, dropping the throughput accordingly. In connection with this problem, the latency of the processor is also prolonged, thus causing an increase in overhead of the whole processing system.
When the broadcast type shared memory system as disclosed in Japanese Patent Laid-Open No. Hei. 5-290,000 is used, only a cycle for writing data to the local shared memory is generated on the shared bus system. A cycle for reading data from the shared memory is implemented to the local shared memory distributed to each processor independently and in parallel in the unit of each processor. Accordingly, no access contention occurs among read cycles of each processor to the shared memory, thus improving the throughput.
However, even if the broadcast type shared memory system is used, the read cycle from the processor and the write cycle from the shared bus system, with respect to the local shared memory, contend on the local shared memory (access contention of read cycle and write cycle). Therefore, the effect of eliminating the overhead and the waste of time has been found to be insufficient even with a parallel processing system having a broadcast type shared memory system. It is noted that this contention also occurs in other shared memory systems, beside the broadcast type system.
Further, in the parallel processing system having a shared memory system, there is a case wherein the plurality of processors implementing processing in concert needs to pass a task processing result reliably to a succeeding task processing in each task processing. In such a case, it is necessary to take into account a data transfer latency (communication latency) among the processing units, such as the processors.