1. Field of the Invention
Apparatuses and methods consistent with the present invention relate to a reconfigurable processor, and more particularly to a reconfigurable processor and method for optimizing the system performance by reducing an overhead which may be necessary in accessing a loop buffer for specifying a connection establishment and an operation setting to each predefined circuit unit to implement certain functions by means of configuration bits.
2. Description of Related Art
In the related art, general-purpose digital signal processors or microprocessors are configured to implement various functions by modifying software therein. However, as these high-performance processors are used to implement various functions, load and power consumption of the processors significantly increase.
If individually dedicated circuits necessary to implement each function are employed, the load of a processor can be reduced, thereby remarkably reducing the power consumption. However, in a system employing such dedicated circuits, there is a need to design or develop a processor or a dedicated circuit according to system specifications. Therefore, the flexibility of the system using the dedicated circuit becomes lowered.
Accordingly, an attempt to use a reconfigurable processor to the recently developed high speed multimedia equipment including a ubiquitous system, or systems as such a mobile phone, a DMB (Digital Multimedia Broadcasting) phone, a personal digital assistant (PDA), or the like for transmitting and receiving high speed radio data has been actively discussed. In such reconfigurable processors, the configuration of PEs (Processing Elements), that is, a connection establishment and an operation setting may be modified by means of software. Here, the PEs may be a plurality of predefined circuit units. As one circuit unit can be implemented to various configurations according to settings modified by software, the reconfigurable processor may be applied to many different models. Consequently, increase of the flexibility, as well as reduction of the chip area and the power consumption of the processor for a set of workloads can be achieved.
FIG. 1 shows a diagram for explaining a general configuration processor 100. Referring to the figure, the configuration processor 100 includes a configuration memory 110, and a circuit array 120 also called CGA (Coarse Grained Array) which is comprised of PEs 121 suitable for a multi-configuration. The PEs 121 comprising the CGA 120 include a function unit (FU), a register file (RF), and the like. With the PEs, the multi-configuration is accomplished according to configuration bits provided from a configuration memory 110. The PEs 121 of the CGA 120 may receive data from another source, process the received data, and transmit the processed result to another destination. With the configuration data of such a configuration memory 110, the PEs 121 of the CGA 120 can implement various functions resulting in increase of flexibility.
FIG. 2 is a diagram for illustrating a scheduling of the operations for each PE when the PEs of the CGA 120 implement a certain loop operation repeatedly. Analyzing the loop operation {A, B, C, D, E, F} in an II=1 method over time, as shown in FIG. 2, a kernel 210, which is uniformly implemented in the PEs, is comprised of operations F, E, C, D, A, and B. The II is an initiation interval of the loop operation in each of the PEs, and the expression II=1 means that a new iteration of the loop begins every cycle. In the case of II=1, to simultaneously implement the six operations of the kernel 210, six PEs are required.
In the case where the loop operations {A, B, C, D, E, F} are implemented in the II=2, analyzing the loop operation over time, kernel 310, which is uniformly implemented in the PEs in each time, is comprised of operations {E, A, B} and {F, C, D}, as shown in FIG. 3, in the case of II=2, four PEs 411˜414 may be implemented, as shown in FIG. 4. That is, in a first cycle 410, a first PE 411, a second PE 412 and a third PE 413 perform A, B, and E operations, and a fourth PE 414 performs a delay operation. Next, in a second cycle 420, the first PE 411, the fourth PE 414, and the third PE 413 perform C, D, and F operations, and the second PE 412 performs the delay operation.
According to the configuration bits transmitted from the configuration memory, the loop operation in the two cycles can be repeated in the PEs of the CGA 120. In addition to such loop operations, various function operations are capable according to the settings of the configuration bits.
However, as shown in FIG. 4, in the second PE 412 and the fourth PE 414, the delay operation is performed during the first cycle 410 or the second cycle 420. In other words, the configuration bits (not shown in FIG. 4) indicating the delay operation must be transmitted from the configuration memory 110 to a loop buffer of the PEs. Also the second PE 412 and the fourth PE 414 must perform the operation according to the configuration bits for the delay operation which is transmitted from the loop buffer. As mentioned above, as the II (initiation interval of the loop operation) increases, the configuration bits to be stored increase. Accordingly, it is desired to have a large loop buffer to hold the configurations as many as the II.