The combination of continuing advances in technology and reduced production costs have led to a proliferation of electronic devices that incorporate or use advanced digital circuits. These electronic devices include both traditional electronic devices such as desktop computers, laptop computers, hand-held computing devices, such as Personal Digital Assistants (PDAs) and hand-held computers, as well as non-traditional devices such as cellular telephones, printers, digital cameras, facsimile machines, and household and business appliances. The digital circuits included in these electronic devices may be used to provide the basic functionality of the electronic devices or may be used to provide additional, desirable features.
For each of these electronic devices it is desirable to reduce the overall cost of the device. This reduction in cost may be accomplished by reducing the cost of the digital circuits incorporated into the device. The cost of the digital circuits may be reduced by reducing the amount of silicon used to fabricate each digital circuit. However, it is important that the digital circuit still meet the appropriate functional and performance requirements. Performance requirements are expressed as a combination of several metrics: throughput (number of tasks executed per clock cycle), latency (number of clock cycles to complete a single task), and clock speed.
Given a functional and performance requirement, synthesis approaches typically try to design a digital circuit with the required functionality that has minimal cost and still meets the performance requirements. FIG. 1 is a block diagram of a typical process for the high-level synthesis of digital circuits. As illustrated, the design process takes as input functional specification 101 of the application and desired performance requirement 102 and performs a number of steps including: analysis, transformations and optimizations step 103, storage determination step 104, functional unit allocation step 105, operation scheduling and resource binding step 106, and hardware synthesis step 107. The structural Register-Transfer-Level (RTL) description of the circuit is then produced as output 108.
The functional specification input 101 is a high level specification that expresses the behavior of the application. It is usually an executable program in a language that the high-level synthesis process understands. If it is a textual document, then the equivalent executable code may need to be written for the purposes of synthesis. The performance requirement 102 represents the throughput, latency, clock speed, etc. required of the synthesized digital circuit.
The program is analyzed and transformed in step 103 to expose opportunities for meeting the desired performance and for cost reduction. This includes techniques to exploit parallelism at the task-level, interation-level, and instruction-level, and other traditional compiler optimizations like common sub-expression elimination, dead code elimination, etc.
In step 104, storage is determined for the variables in the program. Data structures contained in the program may be mapped to global memory while others may be mapped to local memory or possibly to internal registers.
In step 105, functional units are allocated for the operations in the transformed and optimized program. Program operations may include, but are not limited to, additions, subtractions, multiplication, division, etc. A functional unit (FU) refers to components such as adders, multipliers, load-store units and similar components. Each of these functional units is capable of executing one or more type of operations. Allocating functional units entails the process of allocating a minimal-cost set of hardware components that can execute the operations in the program graph and meet the required performance. For example, given a program with additions, subtractions, multiplications, memory loads, and memory stores, step 105 may allocate two multiply-adders, three subtractors, and one load-store functional unit.
Operation scheduling and resource binding are performed in step 106. Operation scheduling involves assigning the start of each operation to a specific clock cycle. For example, an add operation may be assigned to start executing on clock cycle number 23. Resource binding entails selecting, for each operation, a specific functional unit to be used for its execution. For example, in allocating functional units step 105, a determination may have been made that two adders, ADDER1 and ADDER2, are required to be included in the circuit design. In resource binding step 106, a particular addition operation may be bound to ADDER1, i.e., it is assigned to execute on ADDER1.
Typically, unscheduled operations (operations that have not been associated with a clock cycle and a functional unit) are addressed in some order that is either pre-assigned or is dynamically determined during scheduling. Once an unscheduled operation is selected, several alternatives are considered for scheduling and binding this operation. An alternative refers to a specific clock cycle and functional unit that this operation can be scheduled and bound on. The alternatives for an unscheduled operation are derived by determining the available clock cycles and functional units to execute this operation. For example, if there are three possible clock cycles and two possible adder functional units for an operation that requires an adder, there would be six alternatives to be analyzed for scheduling the operation. The scheduler/binder may also undo some prior decisions due to dependency and/or resource-conflict issues. For one example of a scheduling and binding algorithm, see B. R. Rau, “ITERATIVE MODULO SCHEDULING,” International Journal of Parallel Processing, vol. 24, pp. 3-64, 1996, the disclosure of “A” which is hereby incorporated by reference herein. This document is also available as HP Labs Tech. Report HPL-94-115 from Hewlett-Packard Co.
Hardware synthesis step 107 occurs after completion of operation scheduling and resource binding step 106. Hardware synthesis includes the processes of allocating the registers to hold data values and connecting the hardware functional units to each other and from/to allocated storage elements. These interconnections are based on the data flow of the program and the scheduling & binding decisions taken in previous steps.
Finally, the structural description of the circuit is produced as output 109. This RTL circuit description can then be taken through subsequent logic synthesis and place & route steps to produce the final circuit.
The high-level synthesis process may include other steps not shown in FIG. 1. Also, the high-level synthesis process of performing analysis, transformations and optimizations, storage determination, functional unit allocation, operation scheduling and resource binding, and hardware synthesis maybe performed serially in the sequence shown in FIG. 1, or serially in a different sequence, or several of these steps may be combined and performed in parallel. One example of high-level synthesis currently available that performs several steps of the overall process is PICO-NPA. Refer to FIG. 13 and Section 5 of U.S. patent application Ser. No. 09/378,298, filed Aug. 20, 1999, entitled “PROGRAMMATIC SYNTHESIS OF PROCESSOR ELEMENT ARRAYS”, the disclosure of which is hereby incorporated by reference herein.
As mentioned above, the overall objective of the operation scheduling and resource binding step is to associate a specific clock cycle and functional unit to each operation in the program, such that the specified performance requirements are met and the cost of the hardware is minimized. In addition to meeting latency and throughput performance requirements, it is important to ensure that the resulting hardware meets the timing constraints imposed on the circuit paths due to the specified clock frequency. Circuit paths are combinational paths from a primary input to a latch/register, or from a latch/register to another latch/register, or from a latch/register to a primary output, or from a primary input to a primary output.