1. Field of the Invention
The present invention relates to an arithmetic apparatus and an arithmetic method for improving processing speed of conditional branch processing in an arithmetic apparatus comprising reconfigurable hardware.
2. Description of the Related Art
Audio and image signal processing includes a large number processing requiring a large amount of arithmetic operations, for example, repeating the processing of a product-sum operation, etc. When a CPU executes arithmetic processing causing a heavy processing load as such, the processing load on the CPU becomes heavy and processing speed declines. Thus, there has been proposed a processing method capable of realizing high-speed processing by reducing the processing load on the CPU by assigning this part of the arithmetic to a reconfigurable hardware.
FIG. 5 is a view of a configuration example of a reconfigurable arithmetic apparatus. As shown in FIG. 5, a reconfigurable arithmetic apparatus 30 comprises a configuration information memory 301, a data memory 302 and an arithmetic execution unit 303. Also, FIG. 5 shows a host CPU 10 and a shared memory 20 relating to the reconfigurable arithmetic apparatus 30.
The host CPU 10 provides configuration information and arithmetic data to the reconfigurable arithmetic apparatus 30 and receives arithmetic results from the reconfigurable arithmetic apparatus 30.
The shared memory 20 can be accessed by the host CPU 10 and is used for storing configuration information, arithmetic data and arithmetic results of the reconfigurable arithmetic apparatus.
In the reconfigurable arithmetic apparatus 30, the configuration information memory 301 stores configuration information input from the host CPU and provides the stored configuration information to the arithmetic execution unit 303.
The data memory 302 stores arithmetic data input from the host CPU 10 and provides the stored arithmetic data to the arithmetic execution unit 303. Also, the data memory 302 stores arithmetic results obtained in the arithmetic execution unit 303 and outputs the stored arithmetic results to the shared memory 20.
The arithmetic execution unit 303 comprises a plurality of arithmetic units, for example, an adder and a multiplier, etc. By reconfiguring these arithmetic units based on configuration information input from the configuration information unit 301, an arithmetic circuit for realizing new arithmetic functions corresponding to the configuration information is configured. Note that in FIG. 5, only three arithmetic units, which are an arithmetic unit 1, an arithmetic unit 2 and an arithmetic unit 3, are shown as examples in the arithmetic execution unit, but an actual arithmetic processing unit is composed of more arithmetic units. Also, it is possible to reconfigure by using only necessary ones among these arithmetic units in accordance with the configuration information.
FIG. 6 is a flowchart of the arithmetic processing of the above reconfigurable arithmetic apparatus. Below, an operation of the reconfigurable arithmetic apparatus will be explained with reference to FIG. 5 and FIG. 6.
First, the data memory 302 is initialized in accordance with need (step S301), and successively, the configuration information memory 301 is initialized in accordance with need (step S302).
Next, the data memory 302 reads arithmetic data from the host CPU 10, etc. (step S303). Then, the host CPU 10, etc. transmits configuration information to the configuration information memory 301 (step S304).
In the arithmetic execution unit, reconfiguration of hardware is performed based on the configuration information output from the configuration information memory 301 (step S305).
Next, arithmetic data are retrieved from the data memory 302 and arithmetic is executed in a hardware reconfigured based thereon (step S306).
After completing the arithmetic, an arithmetic result is transmitted to the data memory 302 and stored (step S307). Then, the arithmetic result is transmitted from the data memory 302 to the shared memory 20 (step S308).
Since a combination of hardware of the arithmetic execution unit is reconfigured based on the configuration information of the configuration information memory 301 by the above reconfigurable arithmetic apparatus, predetermined arithmetic can be executed at a high speed by the hardware. Therefore, when the host CPU 10 extracts arithmetic with a heavy processing load and generates configuration information based on a hardware configuration for realizing the arithmetic to provide to the reconfigurable arithmetic apparatus 30, the reconfigurable arithmetic apparatus 30 reconfigures the arithmetic execution unit 303 based on the configuration information and executes arithmetic at a high speed based on the arithmetic data provided from the host CPU 10, and the result is transmitted to the shared memory 20. As a result, the processing load on the host CPU 10 can be widely reduced, the processing time can be shortened, and high speed data processing can be easily realized.
Also, in the reconfigurable arithmetic execution unit 303, units for configuration changing are set broader than in an Field Programmable Gate Array (FPGA), etc., and it is configured to be able to deal with a variety of kinds of arithmetic by changing the configuration information and combining an adder, a multiplier, etc. By suitably assigning arithmetic of heavy processing for the host CPU 10, etc. to a reconfigurable arithmetic apparatus, the whole processing time can be shortened.
In the above reconfigurable arithmetic apparatus of the related art, however, it is not possible to perform arithmetic processing including conditional branches at a high speed.
FIG. 7 is a flowchart showing the assignment of conditional branch processing by using the reconfigurable arithmetic apparatus of the related art. As shown in the figure, extraction of a heavy part from arithmetic processing is performed by using a profiler in software first (step S401).
Next, it is judged whether a processing assignment of the heavy arithmetic part extracted by the profiler to the reconfigurable arithmetic apparatus is possible or not (step S402).
As a result of the above judgment, when processing assignment is possible, configuration information is prepared to reconfigure hardware and processing is assigned before executing arithmetic (step S403). By executing arithmetic in this state, arithmetic at a very high speed can be executed compared with software processing.
On the other hand, as a result of the judgment, when processing assignment is impossible, the arithmetic processing has to be performed in the host CPU 10, etc., so that the processing speed becomes slow compared with that in the case of the reconfigurable arithmetic apparatus.
Note that as the case where processing cannot be assigned to the above reconfigurable arithmetic apparatus, particularly, the case where there is a conditional branch in a repeating arithmetic in a software, etc. may be mentioned. In this case, it is reviewed whether the conditional branch can be taken out of the arithmetic or not at the stage of profiling, but it is algorithmically difficult in many cases.