1. Field of the Invention
The present invention relates to a parallel processor comprising a plurality of processor elements and a shared memory connected via a common bus and to a processing method thereof.
2. Description of the Related Art
In recent years, parallel processors have been developed which execute in parallel a plurality of simultaneously executable instructions in a program by a plurality of processor elements (PE) built into a single chip so as to shorten the execution time for the program.
A variety of architectures are being proposed for such parallel processors. Among them, there is one in which a plurality of processor elements and a shared memory are connected to a set of common buses.
FIG. 16 is a view of the system configuration of a general parallel processor 1 of the related art.
As shown in FIG. 16, the parallel processor 1 has built into one chip a common bus 2, n number of processor elements 31 to 3n, a shared memory 4, and a bus unit 5. The common bus 2 has connected to it the processor elements 31 to 3n, the shared memory 4, and the bus unit 5. The bus unit 5 is connected to a main memory 7 via an external chip interface 6. One data port I/O is provided in a memory cell region 4a of the shared memory 4.
In the parallel processor 1, data is transferred via the common bus 2 and the data port I/O when the processor elements 31 to 3n access the data stored in the shared memory 4.
Summarizing the problem to be solved by the invention, in the above parallel processor 1, the data transfer between the processor elements 31 to 3n and the shared memory 4 and the data transfer between the shared memory 4 and the main memory 7 are both carried out via the common bus 2. Furthermore, since the memory cell region 4a of the shared memory 4 has only one data port I/O, there Is the disadvantage that the waiting time of the processor elements 31 to 3n may frequently become long for the following reasons.
Namely, when a page fault occurs in the shared memory 4 and the pages are being exchanged between the shared memory 4 and the main memory 7, the processor elements 31 to 3n cannot access the shared memory 4 because the common bus 2 is in use. Accordingly, an access request from the processor elements 31 to 3n to the shared memory 4 ends up being kept waiting until the completion of the page exchange processing and the processing performance of the parallel processor 1 becomes low.
An object of the present invention is to provide a parallel processor which can realize a high processing performance and method of the same.
According to a first aspect of the present invention, there is provided a parallel processor comprising: a plurality of processor elements each having an inner memory storing one or more sub-pages and performing signal processing for the data stored in the inner memory; a first bus connected to the plurality of processor elements; a second bus connected to an outer memory; and a shared memory connected to the first bus and the second bus, the shared memory comprising: a storage means for storing a plurality of sub-pages and a control means for controlling, in accordance with an access request from the processor element, a transfer of a sub-page between the inner memory of the processor element and the storage means via the first bus and a transfer of a page comprising a plurality of sub-pages between the storage means and the outside memory via the second bus, the control means transferring sub-pages by a first access request which is a request accompanied with a page fault from one processor element to the storage means and a second access request which is a request accompanied with a page fault from another processor element to the storage means from the outside memory to the storage means, and transferring another sub-page of the pages to which the sub-pages by the first access request and the second access request belong from the outside memory to the storage means, when, before the end of a page transfer between the shared memory and outside memory through the second bus due to the first access request, the second access request is generated.
Preferably, the control means transfers sub-pages requested by the first access request and the second access request through the second bus from the outside memory to the storage means, and transfers another sub-page of the page to which the sub-page requested by the first access request belongs through the second bus from the outside memory to the storage means, then transfers another sub-page of the page to which the sub-page requested by the second access request belongs through the second bus from the outside memory to the storage means.
Preferably, the control means transfers the sub-page requested by the first access request through the second bus from the outside memory to the storage means, and transfers the sub-page through the first bus from the storage means to the processor element generating the first access request.
Preferably, the control means transfers the sub-page requested by the second access request through the second bus from the outside memory to the storage means, and transfers the sub-page through the first bus from the storage means to the processor element generating the second access request.
Preferably, the transfer of the sub-page through the first bus and the transfer of the sub-page through the second bus are performed in parallel.
Preferably, the control means is provided with an access request storage unit for storing the first access request and the second access request, a save procedure storage unit for storing a procedure indicating processing for transferring another sub-page of the pages to which the sub-pages by the first access request and the second access request belong from the outside memory to the storage means, and a control unit for storing in the save procedure storage unit a first procedure for transferring another sub-page of the page to which the sub-page by the first access request belongs through the second bus from the outside memory to the storage means, storing in the save procedure storage unit a second procedure for transferring another sub-page of the page to which the sub-page by the second access request belongs through the second bus from the outside memory to the storage means, calling up and executing the first procedure from the save procedure storage unit, and calling up and executing the second procedure from the save procedure storage unit after execution of the first procedure.
Preferably, the control means is provided with an access request storage unit for storing the first access request and the second access request in correspondence with save data and a control unit for transferring the sub-page by the first access request through the second bus from the outside memory to the storage means; setting the save data corresponding to the first access request stored in the access request storage unit in a save state, transferring the sub-page requested by the second access request through the second bus from the outside memory to the storage means, setting the save data corresponding to the second access request stored in the access request storage unit in the save state, using the save data to read the first access request stored in the access request storage unit, transferring another sub-page of the page to which the sub-page requested by the first access request belongs through the second bus from the outside memory to the storage means, using the save data after the transfer to read the second access request stored in the access request storage unit, and transferring another sub-page of the page to which the sub-page by the second access request belongs through the second bus from the outside memory to the storage means.
Preferably, the storage means is provided with a plurality of sub-banks each storing one sub-page and the shared memory further comprises a plurality of selecting means provided corresponding to each of the plurality of sub-banks and connects a corresponding sub-bank and one of the selected first bus and second bus.
Preferably, the data transfer rate of the first bus is the same as the data transfer rate of the second bus or slower than the data transfer rate of the second bus.
Preferably, each sub-bank of the storage region of the storage means is provided with a single data port.
Preferably, the plurality of sub-banks of the storage means have the same storage capacities.
Preferably, the number of the sub-banks of the storage means is the same as the number of sub-pages making up a page.
Preferably, the plurality of sub-pages making up a page have continuous addresses in the address space of the outside memory.
According to a second aspect of the present invention, there is provided a processing method of a parallel processor having a plurality of processor elements comprising the steps of: controlling, in accordance with an access request from a processor element, a transfer of a sub-page between said processor element and a shared memory via a first bus and a transfer of a page comprising a plurality of sub-pages between the shared memory and an outside memory via a second bus, transferring sub-pages by a first access request which is a request accompanied with a page fault from one processor element processors to the shared memory and a second access request which is a request accompanied with a page fault from another processor element to the shared memory from the outside memory to the shared memory, and transferring another sub-page of the pages to which the sub-pages by the first access and the second access belong from the outside memory to the shared memory, when, before the end of a page transfer between the shared memory and outside memory through the second bus due to the first access request, the second access request is generated.
According to a third aspect of the present invention, there is provided a parallel processor comprising: a plurality of processor elements each having an inner memory storing one or more sub-pages and performing processing using the data stored in the inner memory; a first bus connected to the plurality of processor elements; a second bus connected to an outer memory; and a shared memory connected to the first bus and the second bus, wherein the shared memory comprises: a storage means for storing a plurality of sub-pages and a controlling means for controlling, in accordance with an access request from the processor element, a transfer of a sub-page between the inner memory of a processor element and the storage means via the first bus and a transfer of a page comprising a plurality of sub-pages between the storage means and the outside memory via the second bus, transferring a sub-page by an access request to the storage means when there is the access request accompanied with a page fault from one processor element to the storage means, and transferring another sub-page of the page to which the sub-page requested by the access request belongs from the outside memory to the storage means.
According to a fourth aspect of the present invention, there is provided a processing method of a parallel processor having a plurality of processor elements comprising the steps of: controlling, in accordance with an access request from the processor element, a transfer of a sub-page between the processor element and a shared memory via a first bus and a transfer of a page comprising a plurality of sub-pages between the shared memory and an outside memory via a second bus, when an access request accompanied with a page fault is generated, from one processor element to the shared memory among the plurality of processor elements, transferring a sub-page requested by the access request from the outer memory to the shared memory, and transferring another sub-page of the page to which the sub-page by the access request belongs from the outer memory to the shared memory.