A reconfigurable LSI (large scale semiconductor integrated circuit) is constituted with a large number of processor elements (PE) thereinside, and each processor element is constituted with a plurality of stages of an arithmetic element such as an ALU (Arithmetic Logic Unit) and a simple arithmetic element in a subsequent stage.
FIG. 12 is a diagram illustrating a configuration example of a processor element 1201 of a two-stage pipeline. Registers 1202 and 1203 hold 16-bit input data respectively. A register 1204 holds predetermined data. A selector 1205 selects and output data held in the register 1203 or the register 1204. A multiplier 1206 multiplies data held in the register 1202 and the output data of the selector 1205 together, and outputs 32-bit multiplication data. A register 1207 holds the output data of the multiplier 1206. A selector 1208 selects and outputs data held in the register 1204 or a register 1210. An ALU 1209 performs computation based on the data held in the register 1207 and the output data of the selector 1208, and outputs 32-bit computation data. The register 1210 holds output data of the ALU 1209 and outputs 16-bit or 32-bit data to the outside.
When the selector 1208 selects the data held in the register 1210 and the ALU 1209 performs addition, it means the ALU 1209 performs accumulative addition. In such a case, the ALU 1209 overflows due to accumulative addition, deteriorating bit accuracy.
FIG. 13 is a diagram illustrating a configuration example of a reconfigurable circuit using two processor elements 1201a and 1201b. The processor element 1201a has registers 1202 to 1204, 1207, a selector 1205 and a multiplier 1206 of a previous stage of the processor element 1201 in FIG. 12, and outputs data normalized to 16 bits to data network 1301. The processor element 1201b has a selector 1208, an ALU 1209 and a register 1210 of a subsequent stage of the processor element 1201 in FIG. 12, and further has registers 1302 to 1304, and inputs the output data of the processor element 1201a via the data network 1301. In this case, since the processor element 1201a outputs the data normalized to 16 bits to the data network 1301, bit accuracy of the data is deteriorated.
As described above, in accumulative addition (ACC) and multiply-and-accumulation (MAC), accumulative addition/deduction is performed, so that a large bit number is necessary. When the bit number is small, normalization is performed for every computation, and calculation of an accumulative error or accurate bit accuracy becomes necessary.
FIG. 14 is a diagram illustrating a configuration example of a reconfigurable circuit in which bit accuracy is improved by using two processor elements 1401 and 1402. The processor elements 1401 and 1402 have registers 1202, 1203, 1207, 1210, a multiplier 1206 and an ALU 1209 respectively, similarly to the processor element 1201 in FIG. 12. The ALU 1209 in the processor element 1401 performs accumulative addition, outputs 16-bit or 32-bit data to a data network 1403, and outputs 1-bit carry data to the ALU 1209 in the processor element 1402. The ALU 1209 in the processor element 1402 adds that carry data and data held in the register 1210 and outputs 16-bit or 32-bit data to the data network 1403. Thereby, bit accuracy can be improved. However, since the processor element 1402 is necessary for carry computation in addition to the processor element 1401, there is a problem that a resource is wasted due to twice the number of processor elements.
In following Patent Document 1, there is described a signal processor which has: a plurality of processor elements having an input register in an input section of a computing unit and having an output register in an output section of the computing unit; a bus connecting the plurality of processor elements; a switch section altering connection of the bus; and a control circuit controlling the switch section in correspondence with software, the signal processor including: a first operation mode in which the processor element continually performs signal processings; and a second operation mode in which a signal processing by the processor element and a data transfer processing from the output register to the input register of the processor element are performed alternately and connection between the plurality of processor elements are altered in a signal processing period by the processor element.
In following Patent Document 2, there is described a multiplier accumulator which has a CSA (Carry Save Adder) tree and performs fixed point multiply-and-accumulation.
[Patent Document 1] Japanese Laid-open Patent Publication No. 2006-244519
[Patent Document 2] Japanese Laid-open Patent Publication No. 08-328828
When accumulative addition or multiply-and-accumulation is performed, a large bit number is necessary, so that bit accuracy is deteriorated in a processor element with small bit number. Usage of a plurality of processor elements in order to improve bit accuracy leads to a wasteful resource, and usage efficiency of the resource is reduced.