A variety of microprocessors are known to be programmable devices. In such a microprocessor, instructions stored in memory are read out successively and executed sequentially. The microprocessor implements a series of processes of interest by combining and executing, in accordance with the order in which the instructions are processed, individual instructions each of which specifies very simple processing.
This microprocessor is such that since the number of instructions that can be executed simultaneously by a single processor is several at most, the degree to which processing capability can be improved is limited. More specifically, if the same processing is applied to a large quantity of data, it is necessary to repeat sequential processing for one item of data at a time. This means that processing capability cannot be improved.
On the other hand, in a case where the data processing to be executed is limited to a single data process, if a logic circuit is formed by hardware so as to execute the single data process, it will not be necessary to read instructions out of memory in order and sequentially execute processing in order. Although this will make it possible to execute complex data processing at high speed, naturally only a single data process can be executed.
In other words, with a data processing system in which application programs are switched among freely, various types of data processing can be executed but it is difficult to execute data processing at high speed because it is necessary to execute processing sequentially.
With a logic circuit comprising hardware, on the other hand, it is possible to execute data processing at high speed but only a single data process can be executed because the application program cannot be modified.
In an effort to eliminate these tradeoffs, array-type processors have been proposed as data processing devices in which the configuration of the hardware changes in conformity with the software (see Patent Documents 1 to 3).
An array-type processor described in Patent Document 1 is a small-size, high-capability array type processor provided independently with a data path unit, which operates primarily as an operating unit, comprising an array of processor elements electrically connected by programmable switches, and a state-transition management unit, which exercises control, configured to facilitate implementation of state transition means, these sections being implemented by configurations customized to the purpose of processing.
FIG. 10 illustrates the configuration of an array-type processor disclosed in FIG. 1 of Patent Document 1. As shown in FIG. 10, the array-type processor includes a data path unit 102 and a state-transition management unit (or simply a state management unit) 101 for controlling the data path unit 102. The data path unit 102 includes a plurality of processor elements (PE) 105 disposed in a two-dimensional array.
The array-type processor described in Patent Documents 1 and 2 manages the “state” of operation of the processor by, e.g. a number (state number). On the assumption that management of processor operation will transition from a certain operating state to another operating state, a state number that has been stored in a state management information memory 121 of the state-transition management unit is read out and the processor performs an operation corresponding to the state number read out. It should be noted that the term “state” refers to the state of a processor element 105 or programmable switch element 106.
This state number is associated with the address of an instruction code memory that stores an instruction code and the address of a connection-configuration information memory that specifies the mutual connection configuration between programmable switch elements 106 (there are cases where these two addresses are simply referred to as “instruction code addresses”), these being output from the state-transition management unit 101. The operation of the processor elements 105 and the relationship of the connections of the programmable switch elements 106 are decided by an instruction code address applied to the data path unit 102 through an operation control path 103. Specifically, each processor element 105 performs an operation in accordance with the instruction code address supplied thereto. Further, each programmable switch element 106 makes an electrical connection between the interior and exterior of the data path unit 102. The state number will be described below directly in the form of an instruction code address.
The state-transition management unit 101 has a state-transition table memory (not shown). The state number of a subsequent cycle is stored in the state-transition table memory. State numbers are read out successively in accordance with the present internal state of the state-transition management unit 101 or the condition of an event signal from the outside. Since the state signal is in the form of an instruction code address, the state signal that has been read out is input to the data path unit 102 through the operation control path 103.
FIG. 11 illustrates a typical example of the configuration of the processor element 105. The processor element includes configuration information memory 201, a function unit 202 and a wiring connection circuit 203. Although the wiring connection circuit 203, which serves as a switch element, and the configuration information memory 201 are placed in the processor element 105, these may be placed outside the processor element.
The configuration information memory 201 is a memory that stores a plurality of items of configuration information. The configuration information is read out using a state number 210 from the data path unit 102 of FIG. 10 as an address. Similarly, an instruction code is read out of an instruction code memory (not shown) in the data path unit, and a decoded instruction code 115 is applied to a processor element 11.
Configuration information 209 is a signal that sets the connection relationship between the function unit 202 and the wiring connection circuit 203, namely the internal configuration of the processor elements. This information is supplied from the configuration information memory 201 to the function unit 202 and wiring connection circuit 203.
The function unit 202 has one or more functions such as those of an arithmetic unit, memory and register, etc., and the wiring connection circuit 203 has a function for changing over the connection of the function unit in each processor element and the connections between processor elements.
Further, by way of example, the function unit 202 is constituted by two register file units (RFU1, RFU2), two multiplexers (MUX1, MUX2) and an arithmetic and logic unit (ALU).
In each processor element, configuration information corresponding to a state number is read out of the configuration information memory 201, and the function of the function unit 202 and connections of the wiring connection circuit 203 are decided.
Since the data path constructed by the entire array is thus decided by the configuration information, data paths equivalent to the number of items of configuration information that can be stored in the configuration information memory 201 can be constructed.
Since the configuration information 209 is read out by the state number 210, the configuration of the data path can be modified by controlling the state number. This constructing of a connection relationship in accordance with configuration information in the configuration information memory pointed to by the state number is referred to as “mapping”.
After mapping is carried out, the data path unit executes processing in conformity with instruction code 211 from the instruction code memory.
Mapping is performed utilizing all operation resources and wiring resources of the data path unit 102. Naturally, divided processes cannot be executed by the data path unit 102 simultaneously, but the state-transition management unit 101 changes over mapping of the data path cycle by cycle and executes the divided processes one after another.
In other words, the processor of the above-described type causes state numbers to make a transition and sequentially processes instructions corresponding to the state numbers by circuit configurations corresponding to the state numbers, thereby executing the application.
When an application program to be processed is compiled in the array-type processor of the above described related art, the application program is analyzed and is converted to the form of a state transition of a processor element or switch element. Upon executing the application program, into what circuit configuration (connection information of the data path unit) the processor element or switch element is to be placed and what instruction is to be executed at the time of this state are converted to a state number, configuration information of the circuit and instruction code, and transition information indicating the course of this state transition is constructed. Before the application program is executed, the state number and transition information are stored in the state-transition management unit and the configuration information of the circuit and instruction code are stored in the data path unit. The transition information is stored in the transition table.
The operation of an example of the related art will be described with reference to a detailed arrangement.
As execution of an application program, the array-type processor disclosed in Patent Documents 1 and 2 uses a sequencer (not shown) to output state numbers 210 from a state-transition table memory (not shown) of the state-transition management unit 101 to the processor elements 105 (and switch elements 106) of the data path unit 102 successively through the operation control path 103. Here the switch element 106 is incorporated in the processor element 105 as a wiring connection circuit 203. Upon receiving a state number 210, the processor outputs the configuration information specified by the state number 210 to the function unit 202 and wiring connection circuit 203.
Further, the state number 210 is sent to an instruction code memory (not shown), an instruction code is read out from the address of the instruction code memory that corresponds to the state number 210 and a decoded instruction code 211 is sent to the processor element 105. The instruction code 211 is sent to the ALU 208 and registers RFU1 (204) and RFU2 (205) within the processor element 105.
Thus, the configuration information is stored in the configuration information memory 201 in advance.
In the function unit 202, the configuration information thus sent is input to the multiplexers MUX1 and MUX2 as input selection signals, thereby constructing this partial circuit configuration.
Similarly, in accordance with configuration information read out from the configuration information memory 201, the wiring connection circuit 203 constructs a circuit configuration between the RFU1, RFU2, MUX1, MUX2 and wiring connection circuits of the processor elements above, below and to the left and right of its own processor element and performs a data transfer between the wiring connection circuits of the processor elements in accordance with the circuit configuration constructed.
By thus constructing the circuit, a write address and a read address, for example, are input to the RFU1, RFU2 from the instruction code 211 obtained by decoding the instruction code that has been read out of the instruction code memory or from another processor element through the wiring connection circuit 203.
In accordance with this input selection signal, MUX1, MUX2 select either an input from the register file unit (RFU1, RFU2) or an input from the wiring connection circuit 203 and output this signal as data to the ALU 208.
The instruction code 211 along with the circuit architecture based upon the configuration information are sent to the ALU 208. The ALU 208 subjects the data, which has been input in accordance with the constructed circuit, to processing that conforms to the instruction code and outputs the result to the wiring connection circuit 203. The wiring connection circuit 203 delivers this to the processor element of the succeeding stage that operates in a similar manner.
This series of operations is repeatedly executed by the application program in accordance with the state transition of the array-type processor. It should be noted that the state transition of the array-type processor is performed in sync with a clock.
FIGS. 12 and 13 illustrate an example of the data-path configuration (example of mapping) of a data path unit using this processor element.
Here processor elements (PE) are arranged in the form of a 4×4 two-dimensional array. In FIGS. 12 and 13, for the sake of convenience, RFU1, RFU2 and ALU in FIG. 11 are indicated by R1, R2 and A, respectively. Further, in order to distinguish among the processor elements within the array, numbers PE(i,j) are assigned to each of the processor elements, as illustrated.
FIG. 12 (configuration example 1) illustrates an example of a case where a path on which a plurality of ALUs exist is constructed between registers, which are sequential circuits. FIG. 13 (configuration example 2) illustrates an example of a case where, conversely, a path not having even a single ALU is constructed between registers. In other words, FIG. 12 (configuration example 1) is an example of a case where critical path delay is large, whereas FIG. 13 (configuration example 2) is an example of a case where critical path delay is small.
If we let 1T (a unit of delay) represent the data transfer delay between processor elements and the delay of the ALU, then the critical path of configuration example 1 in FIG. 12 will be the path from PE (0,0) to PE (3,3), and the delay time will be 6T (the three delays of the ALUs and the three data transfer delays between processor elements).
Further, in the configuration example 2 of FIG. 13, there are three paths, namely paths from PE (0,0) to PE (0,1), from PE (0,1) to PE (0,2) and from PE (0,2) to PE (0,3). However, the total delay time is 1T and the critical path delay also is 1T.
With the array-type processor of the related art described in Patent Documents 1 and 2, the transition of the state of the data path unit at the time of actual operation is decided by the compilation result of the application program. In other words, what the transition of the change in the circuit configuration will be is decided.
With such an array-type processor of the related art, arrangements having significantly different critical path delays (6T and 1T) are switched between every clock cycle of the array-type processor, as in configuration examples 1 and 2, and this switching takes place frequently.
With such an array-type processor of the related art, maximum operating frequency is decided by the maximum value of the critical path delays of the entire configuration. In this case, ⅙T is the maximum operating frequency.
FIG. 14 is a timing chart useful in describing the problems of the related art shown in FIG. 10. It should be noted that FIG. 14 has been created by the present patent applicants in order to describe the problems of the related art; it is not cited in Patent Document 1.
In FIG. 14, T1, T2, T3, T4 and T5 represent the rise timings of a clock, and C1, C2, C3 and C4 represent the intervals between these timings.
The shaded portions of the data path indicate that the state of the data path has not been determined, and d1, d2, d3 and d4 indicate critical path delays. Among these, d4 is the largest delay. Cycle time must be equal to or greater than d4.
The critical path delay d3 of the T3 cycle is small in comparison with d4, and processing is not executed during the time that corresponds to the difference between these. This time is wasted time if processing efficiency is considered.
With the related art described above with reference to FIGS. 10 to 13, it is possible to switch among a plurality of data paths based upon configuration information. However, in a case where the critical path delay of every data path is different, it is necessary to make operating frequency conform to the maximum delay; operation at high speed in excess of this value of frequency is not possible.
In particular, if the variation in critical path delay is large, an arrangement having a small critical path will not execute any processing for a large part of cycle time.
If delay time is divided equally when processing is divided into data paths (when the above-described compiling is performed), the problem is solved. In actuality, however, such allocation is technically difficult at present.
When an application is compiled and processing is divided into data paths, processing efficiency can be raised by adjusting cycle time in conformity with critical path delay rather than equalizing critical path delays. In general, however, processors that execute processing in parallel operate on the assumption of cycle time of a fixed time interval. Although the processors execute processing in sync in a case where the processors communicate, they operate independently at other times.
In general, therefore, it is difficult to adjust the cycle time of parallel processing processors and to improve operation efficiency.
The arrangement of Patent Document 3 is an example of related art under specific conditions. This arrangement is characterized in that in data transfer between processor elements, cycle time is adjusted in accordance with this data transfer time. Patent Document 3 is a technique regarding data transfer and does not give a detailed description regarding the configuration of a processor array. In addition, for the reasons set forth above, with the related art it is difficult to implement adjustment of cycle time that takes the operation time of processor elements into account. Furthermore, in Patent Document 3, it is necessary to operate processors using the double edges of a clock. Further, in generation of the clock, it is necessary to use a clock whose half period is operation time (ALU delay time). The problem is that a clock having a high speed in comparison with cycle time is required.
[Patent Document 1] Japanese Patent No. 3674515
[Patent Document 2] Japanese Patent Kokai Publication No. JP-P2004-133781A
[Patent Document 3] Japanese Patent Kokai Publication No. JP-A-64-7252