1. Field of the Invention
The present invention relates to a parallel processor comprising a plurality of processor elements and a shared memory connected via a common bus and to a processing method carried out thereon.
2. Description of the Related Art
In recent years, parallel processors which execute in parallel a plurality of simultaneously executable instructions in a program by a plurality of processor elements (PE) built into a single chip so as to shorten the execution time for the program as a whole have been developed.
A variety of architectures are being proposed for such parallel processors. Among them, there is one in which a plurality of processor elements and a shared memory are connected to a set of common buses.
FIG. 9 is a view of the system configuration of a general parallel processor 1.
As shown in FIG. 9, the parallel processor 1 has built into one chip a common bus 2, n number of processor elements 31 to 3n, a shared memory 4, and a bus unit 5. The common bus 2 has connected to it the processor elements 31 to 3n, the shared memory 4, and the bus unit 5. The bus unit 5 is connected to a main memory 7 via an external chip interface 6. One data port I/O is provided in a memory cell region 4a of the shared memory 4.
In the parallel processor 1, data is transferred via the common bus 2 and the data port I/O when the processor elements 31 to 3n access the data stored in the shared memory 4.
In the above parallel processor 1, however, data transfer between the processor elements 31 to 3n and the shared memory 4 and data transfer between the shared memory 4 and the main memory 7 are both carried out via the common bus 2. Furthermore, since the memory cell region 4a of the shared memory 4 has only one data port I/O, there is the disadvantage that the waiting time of the processor elements 31 to 3n may frequently become long for the following reasons.
Namely, when a page fault occurs in the shared memory 4 and the pages are being exchanged between the shared memory 4 and the main memory 7, the processor elements 31 to 3n cannot access the shared memory 4 because the common bus 2 is in use. Accordingly, an access request from the processor elements 31 to 3n to the shared memory 4 ends up being kept waiting until the completion of the page exchange processing and the processing performance of the parallel processor 1 becomes low.
An object of the present invention is to provide a parallel processor and a processing method which can exhibit a high processing performance.
To overcome the above disadvantage of the related art and to achieve the above object, a parallel processor of the present invention comprises a plurality of processor elements, each including an internal memory storing one or more sub-pages and performing processing using data stored in the internal memory; a first bus connected to the plurality of processor elements; a second bus connected to an external memory; and a shared memory connected to both the first and second buses, the shared memory including a storage means having a plurality of sub-banks storing the sub-pages, a control means for controlling data transfer between the internal memory of the processor element and the storage means through the first bus and data transfer between the storage means and the external memory through the second bus, and an access request management means for receiving as input an access request which generates a page fault to the storage means from the processor elements, storing another access request when another access request is input during the data transfer due to the access request between the shared memory and the external memory through the second bus, and causing the control means to execute the stored other access request when the stored other access request does not generate a page fault.
In the parallel processor of the present invention, preferably the access request management means enables storage of a plurality of access requests and makes the control means execute an access request among the plurality of stored access requests which does not generate a page fault when a plurality of access requests are stored prior to a stored access request generating a page fault.
Further, the processing method of the present invention comprises storing one or more sub-pages in a shared memory having a storage regions comprising a plurality of sub-banks each having a single data port and accessed by a plurality of processor elements; controlling data transfer between an internal memory of a processor element and the shared memory through a first bus and data transfer between the shared memory and an external memory through a second bus in response to an access request from the processor element; and, when a processor element issues an access request accompanied with a page fault to the shared memory and during the transfer of data between the shared memory and external memory through the second bus in response to that access request another processor element issues an access request, storing the access request issued by the other processor element, judging if the stored access request causes a page fault or not, and, when judging that it does not cause a page fault, executing the stored access request.