The design and test of modern electronic devices, such as embedded processors (EP) and systems-on-a-chip (SoC) is a complex process involving many stages. For example, many systems, such as mobile devices, networking products, and modems require new embedded processors (EP). These EPs can either be general purpose, such as microcontrollers (μC) and digital signal processors (DSP), or application specific, using application specific instruction set processors (ASIP).
Compared to ASICs, DSPs, ICs, and general-purpose processors, ASIPs provide a tradeoff of computational performance and flexibility on the one hand and power consumption on the other. Therefore, ASIPs that are designed to execute specific tasks very efficiently can be found in a wide range of embedded systems.
However, designing systems with ASIPs is far more complex than assembling systems with standard processors. Typically, designing ASIPs comprises an iterative exploration in which hardware and software are explored and modified. This iterative process is referred to as an architecture exploration loop. The architecture exploration requires a number of tools, such as an assembler, linker, and simulator. If hardware and software are available, profiling results are acquired that usually lead to architecture modifications making the processor more efficient. To be consistent with these modifications, the software tools potentially need to be changed, as well.
The algorithm that is executed by the ASIP is usually specified by algorithm designers in a high level language, such as the C programming language. The overall design time can be significantly reduced by introducing into the architecture exploration loop a compiler that reflects the architecture. Besides reducing the implementation and verification time, the availability of a compiler also increases the system reusability for similar applications.
However, using a compiler in the architecture exploration loop is only beneficial if the compiler itself can be created accurately and efficiently. Thus, there is a need for an efficient and accurate technique for creating a compiler that is usable in an architecture exploration loop.
There have been a number of attempts at generating a compiler for use in architecture exploration. However, these conventional techniques have various weaknesses, such as being limited to the type of architecture that may be explored.
A detailed overview of work related to compiler generation from processor architecture description languages (ADLs) or compiler specifications is given by R. Leupers and P. Marwedel in, “Retargetable Compiler Technology for Embedded Systems,” Kluwer Academic Publishers, Boston, October 2001.
A compiler development environment that is mainly useful for VLIW architectures is the Instruction Set Description Language (ISDL), “ISDL: An Instruction Set Description Language for Retargetability,” G. Hadjiyiannis, S. Hanono, and S. Devadas. In Proc. of the Design Automation Conference (DAC), June 1997. This conventional technique hierarchically describes the processor and lists invalid instruction combinations in a constraints section. This list becomes very lengthy and complex for DSP architectures like the Motorola 56k. Therefore, this technique is mainly useful for orthogonal processors.
A technique described by Trimaran is capable of retargeting a sophisticated compiler. However, the technique is limited to a very restricted class of VLIW architectures called HPL-PD. HPL-PD (Hewlett-Packard Laboratories PlayDoh) is a parametric processor architecture conceived for research in instruction-level parallelism (ILP). Trimaran's tool input is a manual specification of processor resources (functional units), instruction latencies, etc. (Trimaran. “An Infrastructure for Research in Instruction-Level Parallelism” http://www.trimaran.com.)
An extension of the CoSy® environment (ACE Associated Computer Experts by. “The CoSy® Compiler Development System” http://www.ace.nl.) can be retargeted from a FlexWare2 description. (P. Paulin. “Towards Application-Specific Architecture Platforms Embedded Systems Design Automation Technologies.” In Proc. of the EuroMicro, April 2000.) Unfortunately, for the generation of the other software tools, FlexWare2 requires separate descriptions. This redundancy introduces a consistency/verification problem.
The concept for scheduler generation has been proposed in EXPRESSION. (Peter Grun, Ashok Halambi, Nikil D. Dutt, and Alexandru Nicolau. “RTGEN: An Algorithm for Automatic Generation of Reservation Tables from Architectural Descriptions.” In Proc. of the Int. Symposium on System Synthesis (ISSS), pages 44-50, 1999.) The concept for scheduler generation has also been proposed in PEAS-III. (M. Itoh, S. Higaki, J. Sato, A. Shiomi, Y. Takeuchi A. Kitajima, and M. Imai. “PEAS-III: An ASIP Design Environment.” In Proc. of the Int. Conf. on Computer Design (ICCD), September 2000). Both of these conventional techniques extract structural information from the processor description that allows the tracing of instructions through the pipeline. Instructions are automatically classified by their temporal I/O behavior and their resource allocation. Based on this information, a scheduler can be generated. In PEAS-III, all functional units that are used to model the behavior of instructions are taken from a predefined set called flexible hardware model database (FHT).
MIMOLA traces the interconnects of functional units to detect resource conflicts and I/O behavior of instructions. (R. Leupers and P. Marwedel. “Retargetable Code Compilation based on Structural Processor Descriptions.” Design Automation for Embedded Systems, 3(1):1-36, January 1998. Kluwer Academic Publishers). For non-pipelined architectures, it is possible to generate a compiler called MSSQ, which also includes an instruction scheduler. However, the abstraction level of MIMOLA descriptions is very low, which slows down the architecture exploration loop.
The CHESS (D. Lanner, J. Van Praet, A. Kiffl, K. Schoofs, W. Geurts, F. Thoen, and G. Goosens. “Chess: Retargetable Code Generation for Embedded DSP Processors.” In P. Marwedel and G. Goosens, editors, Code Generation for Embedded Processors. Kluwer Academic Publishers, 1995.) code generator is based on an extended form of the nML ADL (A. Fauth, J. Van Praet, and M. Freericks. “Describing Instruction Set Processors Using nML.” In Proc. of the European Design and Test Conference (ED & TC), March 1995). Similar to the MSSQ compiler, the scheduler uses the instruction coding to determine which instructions can be scheduled in parallel. In contrast to MSSQ, the CHESS compiler can be used to generate code for pipelined architectures. This is achieved by manually attaching latency information (e.g., number of delay slots) to the instructions. CHESS is primarily useful for retargeting compilers for DSPs.
The Marion system uses the Maril language to generate a compiler. (D. G. Bradlee, R. E. Henry, and S. J. Eggers. “The Marion System for Retargetable Instruction Scheduling.” In Proc. of the Int. Conf. on Programming Language Design and Implementation (PLDI), pages 229-240, 1991.) However, the system is restricted to RISC architectures: All target machines need to have general purpose register sets, each instruction produces at most one result, and only load and store operations can access memory.
The Mescal group, which is part of the Gigascale Research Center, recently proposed an operation state machine (OSM) based modeling framework. (W. Qin and S. Malik. “Flexible and formal modeling of microprocessors with application to retargetable simulation.” In Proc. of the Conference on Design, Automation & Test in Europe (DATE), March 2003.) OSM separates the processor into two interacting layers: an operation and timing layer and a hardware layer that describes the micro-architecture. A StrongARM and a PowerPC-750 simulator could be generated.
An operBT/listBT backtracking scheduler has been proposed. (S. G. Abraham, W. Meleis, and I. D. Baev. “Efficient backtracking instruction schedulers.” In IEEE PACT, pages 301-308, May 2000.) However, the technique described in that paper is limited in its ability to handle delays. The paper presents two different backtracking scheduler techniques: The operBT scheduler and the listBT scheduler. Both schedulers assign priorities to the nodes of the dependence DAG. In contrast to other schedulers, the operBT scheduler does not maintain a ready list. It utilizes a list of nodes not yet scheduled that is sorted by node priority. It takes the highest priority node from this list and schedules it using one of the following three scheduling modes:
Schedule an operation without un-scheduling (normal).
Un-schedule lower priority operations and schedule into current_cycle (displace).
Un-schedule high priority operations to avoid invalid schedules and schedule an instruction into a so-called force_cycle (force).
The operBT scheduler has the drawback of being relatively slow due to many un-scheduling operations. To overcome this drawback, the operBT scheduler was extended to the listBT scheduler. This scheduler tries to combine the advantage of the conventional list scheduler (fast) with the advantage of the operBT scheduler (better schedule). The listBT scheduler does maintain a ready list. This means only nodes that are ready can be scheduled. Unfortunately the delay slot filling of the listBT scheduler does not work for all cases.