1. Field of the Invention
The present invention relates to an ALU (arithmetic and logic unit) array comprising a plurality of ALUs, et cetera, and a technique for setting an instruction for an ALU and control for interconnecting ALUs with the ALU array by configuration information; and in particular to a comprisal of a functional unit being shared among reconfigurable arithmetic circuits for an arithmetic unit having a plurality of reconfigurable arithmetic circuits (i.e., clusters) which switch, et cetera, the configuration information by using a sequencer.
2. Description of the Related Art
Today, proposals have been presented to improve processing speed and make a processor compact by carrying out operation processing by arraying a plurality of reconfigurable arithmetic circuits (simply called a “cluster” hereinafter) in a reconfigurable processor having a cluster.
FIG. 1 exemplifies an arithmetic unit, in which each cluster 1 is connected by a crossbar connection 12 for instance, thereby enabling data transmission among the clusters.
A single cluster 1 has an ALU array unit which is equipped with a plurality of operation units. The operation unit is usually comprises of an ALU, a multiplier or the like.
The cluster is comprised, for example, as shown by FIG. 2 (N.B.: FIG. 2 is an illustrative block diagram conceptually showing a comprisal of a cluster within a conventional reconfigurable processor).
The cluster 1 comprises a operation unit group 2 (i.e., ALU array unit), configuration memory 3 and a sequencer 4.
The operation unit group 2 comprises a data input unit 5, a data buffer unit 6, a data buffer control unit 7, an inter-operation unit network 8, data memory 9 and operation units 10.
The data input unit 5 supplies the data memory 9 and operation units 10 by way of the inter-operation unit network 8 with input data which is input from the outside. For example, the data input unit 5, comprising a data buffer unit 6 as an example configuration, so that the data buffer unit 6 selects either buffering, or not buffering, input data input from the outside based on a control signal from the data buffer control unit 7. The data buffer control unit 7 receives configuration information from the configuration memory 3, sends a control signal to the data buffer unit 6 as the aforementioned control signal according to the configuration information and selects either buffering or not buffering input data.
The inter-operation unit network 8 is interconnected with various components (e.g., data input unit 5, data memory 9 and operation units 10). The inter-operation unit network 8 enables data transmission between various components connected therewith according to configuration information (i.e., data generated by compiling a program) generated based on configuration data (i.e., source such as C-language, HDL, et cetera, created by a program) which is supplied from the outside. The data memory 9 records data by way of the inter-operation unit network 8. The operation units 10 are set up so as to achieve the function related to configuration information based thereon and carry out the setup operation.
The configuration memory 3 loads (i.e., loads by utilizing a communication unit comprised by a PC (personal computer) for instance)) configuration information thereon from an external storage apparatus (e.g., PC; not shown herein) which stores the configuration information. And the configuration memory 3 receives a configuration switching condition signal generated based on a condition establishment signal (e.g., signal such as chip select) mainly transmitted by the operation units 10 of various reconfigurable components constituting the configuration data load unit (not shown herein) and operation unit group 2. Generation of a configuration switching condition signal, for example, is based on the above mentioned condition establishment signal and configuration data from the configuration memory 3.
The sequencer 4 generates an address of the above mentioned configuration information to be read out of the configuration memory 3 based on a switching condition signal.
A patent document 1 discloses a configuration to array in a form of columns and rows a large number of processor elements for carrying out individually data processing in response to instruction codes whose data are respectively set up and for controlling switching a mutual connection relationship, in which a state management unit switches an instruction code one after another for the large number of processor elements. But, there is plural number of state management units which operate in collaboration through mutual communication and a large number of processor elements are segmented into the same plural number of element areas. Since a plurality of state management units are respectively allocated to the plurality of element areas and connected to processor elements, the plurality of state management units is capable of managing a small scale plurality of state transitions individually. Also a plurality of state management units is capable of managing one state transition on a large scale by collaborating with one another.
And a patent document 2 has proposed an array type processor which is capable of being compact and high performance by connecting arrayed processor elements electrically by programmable switches, furnishing a data path unit for mainly carrying out an arithmetic logic operation and a state transition management unit independently for controlling state transitions and accomplishing specific configuration according to the respective processing purposes.
And, an operation processing with a large operating load such as division processing is sometimes required when carrying out operations with a reconfigurable processing apparatus comprised as described above. A method has been proposed to carry out an operation in such a case by using a dedicated hardware accelerator shown by FIG. 3 and making a CPU or DMAC (direct memory access controller) intervene in the operation processing.
In the case of carrying out operation processing such as division processing by the method as shown by FIG. 3, however, a CPU 31 or DMAC has to intervene in order to simplify a processing start or data transmission. While this makes the interface be integrated, there is a problem of decreased operation processing capacity because of the intervention of the CPU 31. On the other hand, when considering the case of not causing the CPU 31 to intervene, it becomes difficult to integrate the interface, hence requiring consideration of the interface every time hardware is designed anew, resulting in difficulty of reusing a design asset simply.
Accordingly, a method can be conceived to furnish the above described reconfigurable processing apparatus having a plurality of clusters with a divider, et cetera, directly. If one is furnished for the operation unit group 2 (i.e., ALU array part) within the cluster 1 for example, it is possible to not only improve processing capability as above but also to improve the processing capability further by providing hardware specialized for an application in the operation unit group 2 (i.e., ALU array part) in place of a general purpose operation unit such as divider.
However, (1) a general purpose operation unit such as divider is a large scale circuit as compared to a multiplier or ALU, (2) usage frequency is lower than the ALU or multiplier and therefore, if a divider or application specific engine is equipped in a reconfigurable arithmetic circuit, usage efficiency of operation unit per unit area goes down, hence leading to a cost increase, and (3) the equipment of an application specific engine causes a waste of resources because other applications will not use it, thus requiring a redesign to remove it.
It is then possible to improve an area or usage efficiency if an application specific engine such as a divider can be installed outside the cluster 1 so as to be shared among them. It is also possible to replace the application specific engine with a different application engine since it is external to the cluster 1.
In the case of installing it external to the cluster 1 and sharing it among a plurality of clusters, the following problems will occur: (4) it is not possible to start up processing without installing a CPU, et cetera, outside the cluster 1 to intervene, hence requiring some kind of start-up means, (5) a connection and control method for the cluster 1 is required for discretionary clusters 1 utilizing an application specific engine, (6) in the case of using a unique signal line, et cetera, for every application specific engine as with ordinary hardware, it is not possible to replace the application specific engine for a different application, thus requiring a system for enabling a replacement, and (7) in the case of sharing an application specific engine installed external to the cluster 1 among a plurality of clusters 1, a common control is required.
Though, the patent document 1 has disclosed a shared resource shared by two clusters 1, it must be controlled by one of the clusters 1. Patent document 2 has disclosed a multiplier external to the cluster 1, but the usage method or operation thereof is not disclosed.
[Patent document 1] laid-open Japanese patent application publication No. 2004-133781
[Patent document 2] laid-open Japanese patent application publication No. 2001-312481