The present invention relates to a parallel processing apparatus dynamically switching over a circuit configuration.
A reconfigurable circuit includes a plurality of computing elements (which are also called processor elements (PEs)) and a circuit (a network) for executing a delay adjustment, and is packaged in a semiconductor (e.g., an LSI). This circuit connects the respective computing elements and executes parallel processing. Further, the circuit sets, based on configuration data for batchwise controlling the circuit, a circuit configuration thereof, i.e., a type of the operation to be executed by each computing element and a connection between the computing elements. The circuit includes a configuration memory stored with the configuration data. The circuit, when reconfiguration of the circuit is required, loads configuration data that designates the circuit requiring the reconfiguration from the configuration memory. The circuit sets, based on the loaded data, the types of the respective operations executed by the plurality of computing elements and the connections between the computing elements. Further, the circuit reconfigures the circuit on the basis of the data loaded from the configuration memory, thereby dynamically switching over the circuit configuration. Herein, a phrase [dynamically switching over the circuit configuration] connotes that the circuit is reconfigured during a period for which the circuit processes the operation target data.
The following are given as the prior art technical documents related to the present invention.
[Patent document 1] Japanese Patent Application Laid-Open Publication No. 1-320564
[Patent document 2] Japanese Patent Application Laid-Open Publication No. 5-324694
FIG. 20 shows an example of a configuration of a reconfigurable circuit 1 according to the prior art. The reconfigurable circuit 1 includes a plurality of PEs 2, a configuration memory 3 and a network 4 that connects the plurality of PEs 2 and the configuration memory 3 to each other.
The reconfigurable circuit 1, when operating the circuit, uses configuration data stored in the configuration memory 3. The reconfigurable circuit 1 configures, based on the configuration data, types of the respective operations executed by the plurality of PEs 2 and the connections by the selectors establishing the connections within the network 4.
The reconfigurable circuit 1, during the operation of the circuit, employs the configuration data stored in the configuration memory 3. The reconfigurable circuit 1 dynamically switches over, based on the configuration data, the types of the respective operations executed by the plurality of PEs 2 and the connections by the selectors specifying the connections within the network 4. The reconfigurable circuit 1 according to the prior art as shown in FIG. 20 has the following four problems.
Firstly, the reconfigurable circuit 1 configures, for the operation of the circuit, the types of the respective operations executed by the plurality of PEs 2 and the connections by the selectors establishing the connections within the network 4. Accordingly, the reconfigurable circuit 1, for the operation of the circuit, previously loads the configuration data designating the configuration of the whole circuit into the PEs 2 and the selectors within the network 4 from the configuration memory 3. The reconfigurable circuit 1 sets, based on the loaded data, with respect to the whole circuit, the types of the respective operations executed by the plurality of PEs 2 and the connections by the selectors establishing the connections within the network 4. Therefore, such a problem arises that the reconfigurable circuit 1 requires the time for configuring the circuit.
Secondly, a case of switching over the type of the operation (e.g., an arithmetic operation) executed by the single PE 2 in the reconfigurable circuit 1 will be explained. Further, this case includes a case of switching over, in the reconfigurable circuit 1, setting of the connections by the selectors establishing the connections within the network 4. For this switchover, the reconfigurable circuit 1 also loads the configuration data designating the configuration of the whole circuit from the configuration memory 3 each time the switchover of the circuit configuration is executed. The reconfigurable circuit 1 reconfigures, based on the loaded data, the types of the operations executed by the PEs 2 and the setting of the connections by the selectors establishing the connections within the network 4. The reconfigurable circuit 1 reconfigures the circuit in these procedures and therefore requires the time for reconfiguring the circuit. Another problem is that the reconfigurable circuit 1 is unable to process the operation target data for a period of time during which the circuit is reconfigured.
FIG. 21 shows how the reconfigurable circuit 1 operates to switch over the type of the operation executed by one of the plurality of PEs 2. The PE 2 (PE 2A) before the switchover executes addition (ADD). Further, the PE 2 (PE 2B) after the switchover executes subtraction (SUB). The reconfigurable circuit 1 also, in the case of switching over the type of the operation (e.g., the switchover from ADD to SUB) executed by one of the plurality of PEs 2, loads the configuration data designating the configuration of the whole circuit from the configuration memory 3. The reconfigurable circuit 1 reconfigures, based on the loaded data, the types of the operations executed by the PEs 2 and the connections by the selectors establishing the connections within the network 4.
FIG. 22 shows a pipeline representing an operation of, after the configuration of the reconfigurable circuit 1 has been switched over once in the middle of processing the data by this circuit, again processing the data. The reconfigurable circuit 1, for switching over this circuit configuration, loads the configuration data designating (the configuration of) the whole circuit, which are stored in the configuration memory 3, and thus reconfigures the circuit. Accordingly, as illustrated in FIG. 22, during the reconfiguration (Reconfig) of this circuit, the reconfigurable circuit 1 does not process the operation target data. Hence, during the reconfiguration of the circuit, a clock cycle in which none of the data is processed occurs in the reconfigurable circuit 1. Especially, the circuit with the pipeline formed requires cycles corresponding to the number of stages till parallel operations of the respective stages of the pipeline are conducted. Accordingly, the reconfiguration of the whole circuit leads to a decrease in efficiency (a rate of the parallel operation period of the respective stages) of the pipeline.
Thirdly, the configuration data designating the whole circuit are stored in the configuration memory 3. The configuration data have a comparatively large data size. Accordingly, in the process of the operation target data, when the reconfigurable circuit 1 needs a plurality of circuit configurations, a problem arises, wherein the configuration memory 3 is required to have an extremely large storage capacity. A further problem is that if the configuration memory 3 has the large storage capacity and is stored with the data having the large data size, there is such a problem that the reconfigurable circuit 1 requires the time for accessing the stored data.
Considered, as for a problem similar to the third problem, is a case where the reconfigurable circuit 1 loads the configuration data into the configuration memory 3 from a memory (unillustrated) outside the LSI packaging this circuit. In this case, the configuration data used for every circuit configuration have the comparatively large data size, and hence a problem is that an extremely long period of time is needed for loading the data.
Considered, for instance, is such a state that the LSI packaging the reconfigurable circuit 1 uses an external memory (unillustrated) accessible on a 32-bit unit and an internal memory (configuration memory 3) accessible on the 32-bit unit. In this state, such a case is considered that the data having a 1000-bit data size as a whole are loaded from the external memory of LSI. In this case, an assumption is that the reconfigurable circuit 1 writes the data having a 32-bit data size to the configuration memory 3 at one cycle (clock). Hence, the reconfigurable circuit 1 requires at least 1000/32 cycles, i.e., approximately 32-cycle write time till the data finishes being written.
The present application was made in view of the problems inherent in the prior art described above. Namely, it is an object of the present invention to provide a parallel processing apparatus for actualizing faster parallel processing with a storage medium having a smaller capacity and dynamically switching over a circuit configuration.