1. Field
The following description relates to instruction scheduling, and more particularly, to an apparatus and method for scheduling an instruction for a reconfigurable processor.
2. Description of the Related Art
A coarse-grained reconfigurable array (CGRA) includes an accelerator used to improve is program execution speed, indicating a set of several functional units that can process various operations. A general platform using an application-specific integrated circuit (ASIC) may show faster execution speed than a general-purpose processor, but may not be able to process various applications. On the other hand, a platform using the CGRA may process many operations in parallel to improve its performance while remaining flexible. Thus, the platform using a CGRA may be an efficient platform for a next-generation digital signal processor (DSP).
A CGRA uses instruction-level parallelism (ILP) between operations of an application as much as possible. To be specific, operations that can be simultaneously processed are distributed to a plurality of functional units constituting a CGRA and processed simultaneously, thereby reducing the execution time of the application. To sufficiently use ILP of an application, a CGRA generally uses modulo scheduling. Modulo scheduling includes a scheduling algorithm using a software pipelining technique. Modulo scheduling may overlap successive iteration operations of a loop and increases ILP of instructions between iterations of different loops, thereby improving performance. As a result of modulo scheduling, iterations in a loop may be started at regular time intervals. Here, the time interval is referred to as an iteration interval (II), and consequently, is associated with a throughput of a pipeline. For this reason, the shorter the II, the better quality results from modulo scheduling.
However, a scheduler for a CGRA should perform scheduling in consideration of operand routing between operations, unlike a general modulo scheduler. This is because a CGRA has a characteristic in that a connection logic between functional units is very sparse to keep hardware complexity at a predetermined level or less.