Concurrent processing allows increasing the performance of a processor, and requires some form of parallelism to be introduced in the processor architecture. A processor can exploit two forms of parallelism. The first is instruction-level parallelism, in which more than one instruction at a time is executed within one task. The second concerns task-level parallelism, in which multiple tasks are executed simultaneously by the processor. The application that has to be executed determines the amount of instruction-level parallelism and task-level parallelism that can be maximally exploited.
Configurable processors are pre-fabricated devices that can be customized to perform a specific function. An example of a configurable processor is a scaleable VLIW (Very Large Instruction Word) processor, i.e. a VLIW processor with a large number of functional units. A VLIW processor allows exploiting instruction-level parallelism in programs and thus executing more than one instruction at a time. Multiple, independent functional units are used to execute multiple operations in parallel. VLIW processors carry out multiple functional unit operations in response to one very long instruction. Each VLIW effectively configures the data path of the processor for computations in space, i.e. in parallel.
The flexibility of a VLIW processor can be improved by allowing the processor to execute multiple tasks in parallel and thus exploiting task-level parallelism, if present in an application. In case of a traditional VLIW processor only a single task can be executed, due to the presence of only a single controller with a corresponding single program counter. VLIW processors with partitioned controllers, however, are capable of exploiting task-level parallelism, and this principle is described in Architecture and implementation of a VLIW supercomputer, Colwell R. et. al., Proc. of Supercomputing '90, New York, N.Y., USA, 12–16 Nov. 1990. Each controller controls a segment of the processor and in principle two operation modes are possible. In the first mode the controllers operate independently, while in the second mode all controllers are locked together. In the first mode the net effect is that of having a multi-processor system, allowing executing multiple tasks simultaneously and thus exploiting task-level parallelism. In the second mode, a classical VLIW processor is obtained. It is possible to switch between both modes during computation.
A problem associated with the introduction of parallelism in a processor architecture, among others, is related to the increase in number of functional units and the corresponding increase in communication overhead, as this may result in unnecessary power dissipation if these resources can not be fully used at a given moment in time. For example, in case of a scaleable VLIW processor with partitioned controllers, functional units will remain unused if not sufficient instruction-parallelism or task-level parallelism is present in a specific application. These functional units may still consume a significant amount of power.
The VLIW processor of U.S. Pat. No. 6,219,796 has processing units that have been made responsive to a dedicated instruction, e.g. a SLEEP instruction, which at least partially powers down the associated execution unit. The execution units are made active again either by another dedicated instruction, e.g. a WAKE instruction, or by the receipt of an active, i.e. a non-SLEEP instruction. Consequently, the active configuration of the processor can be altered by dedicated instructions present in the instruction flow of VLIWs, resulting in a reduction of the power consumption by the active processor. The dedicated instructions are inserted into a VLIW by the compiler. This is realized by first detecting a segment of inactive instructions, e.g. NOPS, for a given functional unit and, subsequently, replacing the first inactive instruction in the segment by, for example, a SLEEP instruction, and replacing the last inactive instruction in the segment with a WAKE instruction.
It is a disadvantage of the prior art processor that a processing unit cannot be completely switched off. Some control logic will have to remain powered in order to be able to process the instruction for making the processing unit active again.