Conventionally, dynamic reconfigurable circuits (hereinafter, reconfigurable circuits) have a function of changing the contents of a command to a processing element (PE) in the reconfigurable circuit and connection between PEs during operation. Generally, information indicative of the contents of a command to a PE in the reconfigurable circuit and of connection between PEs is referred to as a context. Reading in a new context to change configuration is referred to as context switch.
The reconfigurable circuit changes a context to enable common use of PEs divided along a temporal axis, thereby enabling reduction of the hardware size of the reconfigurable circuit as a whole. The reconfigurable circuit may include plural clusters (see, e.g., Japanese Laid-Open Patent Application Publication No. 2006-18514). Such a cluster-type reconfigurable circuit can control context switch according to cluster.
FIG. 17 is a circuit diagram of an internal configuration of a conventional cluster. A cluster 110 includes a sequencer 310, a configuration memory 320, a PE array 330, and a crossbar switch 111. The sequencer 310, a state machine, controls the switching of context stored on the configuration memory 320. The PE array 330 changes the arithmetic processing contents or connections of PEs according to configuration data read out from the configuration memory 320 under the control of the sequencer 310.
Typically, in the installation of an application program to a reconfigurable circuit, a source code written in C language and compiled by a compiler for the reconfigurable circuit is used for the application program. Here, among processes written in C language, a loop control process is particularly time consuming. The reconfigurable circuit, however, has a configuration that reduces the processing time for the loop control through pipeline arithmetic processing of the loop control. Specifically, the reconfigurable circuit includes a counter and output from the counter serves as a starting point from which the arithmetic processing including loop control can be controlled.
The clusters 110, as depicted FIG. 17, are interconnected via respective crossbar switches 111 in a matrix arrangement. FIG. 18 is a schematic of an example of data transfer between conventional clusters. Connections between the clusters 110 will be described with reference to FIG. 18. In a reconfigurable circuit 100, the clusters 110 are interconnected via the crossbar switches 111 in a matrix arrangement. In this manner, by using the crossbar switches 111, the number of clusters is adjusted to determine the number of arithmetic processors (PE) incorporated in the reconfigurable circuit 100 enabling customization. The clusters 110 can transfer data to each other via the crossbar switches 111. In this configuration, a D flip-flop (DFF), which is not depicted, is disposed on a line interconnecting clusters. Disposing the DFF prevents such a situation where a timing restriction on data transfer between clusters 110 cannot be satisfied due to LSI operation speed.
In the cluster-type reconfigurable circuit 100, therefore, the number of clusters 110 and the number and bit width of ports on a line between clusters 110 can be changed freely, depending on the application program installed in the reconfigurable circuit 100 and the circuit area of the LSI. In the example depicted in FIG. 18, the number of clusters is four (clusters 0, 1, 2, and 3). When the number of PEs is to be increased, additional clusters, such as clusters ex0, ex1, ex2, and ex3, are arranged horizontally and vertically with respect to the orientation of FIG. 18.
The number and bit width of ports on a line between clusters 110 depend on the architecture of arithmetic processors in the clusters 110. Generally, any one of an 8-bit processor, 16-bit processor, and 32-bit processor is adopted. By increasing the number of ports, the types of data that can be transferred between clusters 110 can be increased.
The conventional cluster-type reconfigurable circuit 100, however, may have trouble in data transmission between the clusters 110 when carrying out processing across context switching (e.g., a series of processes including a change in context from a context A to a context B).
A context can be changed without a standby-cycle when the sequencer 310 in the cluster 110 is able to read a context transition destination in advance. When data transmission is performed between different clusters 110, however, a cluster 110 as a data transmission origin cannot grasp the state of another cluster 110 as a data transmission destination. As a result, the data transmission origin cluster 110 sends unnecessary data to the data transmission destination cluster 110 because of the context switch, which may lead to the occurrence of a malfunction.
In an example in which a group of clusters 110 are interconnected in matrix arrangement as depicted in FIG. 18, two types of data A and B are transferred from a cluster 0 to clusters 2 and 3. FIG. 19 is a schematic of a context switch sequence at each cluster.
At each cluster 110 (cluster 0, 1, 2, and 3) depicted in FIG. 18, context switch from a context 0 to a context 1 is performed at a given time (time n) (see FIG. 19) under the control of the internal sequencer 310 (see FIG. 17). “context numeral-numeral” written in each cluster depicted in FIG. 19 means “context [context number]-[cluster number]”. In context switching, transition to the next context can be made without a stand-by cycle.
As depicted in FIG. 19, when data transfer is performed between clusters 110 across context switching, the reconfigurable circuit 110 may malfunction because of the DFF disposed on the line between the clusters 110. FIG. 20 is a timing chart of inter-cluster data transfer operation. As depicted in the timing chart of FIG. 20, while executing contexts 1-2 and 1-3, clusters 2 and 3 receive data that the cluster 0 outputs according to a context 0-0 (portion marked with *). The data received is data that has been held in the DFF between the clusters 110 during cluster switch.
As depicted in FIG. 20, among data output from the cluster 0, data A-0 to A-5 and B-0 to B-5 are generated by a process based on the context 0, while data A-6 to A-10 and B-6 to B-10 are generated by a process based on the context 1. Here, if the clusters 2 and 3 continue to use output data that is generated based on the context 0 preceding the current context by one context as input data (i.e., group of data marked with * in FIG. 20), the clusters 2 and 3 receiving the data having been held in the DFF causes no problem.
However, when the output data based on the context 0 is not used as input data that is to be used based on the currently set context 1, using the data based on the context 0 preceding the current context by one context and having been held in the DFF between the clusters 110, may result in output of different calculation values or the occurrence of malfunction. To remedy such a situation, a cycle of intentional flow of invalid data must be added during context switch, resulting in the occurrence of unnecessary waiting during context switch, thus leading to a problem of the deterioration of performance of the reconfigurable circuit.