The invention relates to technologies and techniques for integrated circuit (“IC”) design.
A semiconductor integrated circuit (IC) has a large number of electronic components, such as transistors, logic gates, diodes, wires, etc., that are fabricated by forming layers of different materials and of different geometric shapes on various regions of a silicon wafer. The design of an integrated circuit transforms a circuit description into a geometric description called a layout. The process of converting specifications of an integrated circuit into a layout is called the physical design. After the layout is complete, it is then checked to ensure that it meets the design requirements. The result is a set of design files, which are then converted into pattern generator files. The pattern generator files are used to produce patterns called masks by an optical or electron beam pattern generator. Subsequently, during fabrication of the IC, these masks are used to pattern chips on the silicon wafer using a sequence of photolithographic steps. Electronic components of the IC are therefore formed on the wafer in accordance with the patterns.
Many phases of physical design may be performed with computer aided design (CAD) tools or electronic design automation (EDA) systems. To design an integrated circuit, a designer first creates high level behavior descriptions of the IC device using a high level hardware design language. An EDA system typically receives the high level behavior descriptions of the IC device and translates this high-level design language into netlists of various levels of abstraction using a computer synthesis process. A netlist describes interconnections of nodes and components on the chip and includes information of circuit primitives such as transistors and diodes, their sizes and interconnections, for example.
An integrated circuit designer may use a set of layout EDA application programs to create a physical integrated circuit design layout from a logical circuit design. The layout EDA application uses geometric shapes of different materials to create the various electrical components on an integrated circuit and to represent electronic and circuit IC components as geometric objects with varying shapes and sizes.
After an integrated circuit designer has created an initial integrated circuit layout, the integrated circuit designer then tests and optimizes the integrated circuit layout using a set of EDA testing and analysis tools. Common testing and optimization steps include extraction, verification, and compaction. The steps of extraction and verification are performed to ensure that the integrated circuit layout will perform as desired. Extraction is the process of analyzing the geometric layout and material composition of an integrated circuit layout in order to “extract” the electrical characteristics of the designed integrated circuit layout. The step of verification uses the extracted electrical characteristics to analyze the circuit design using circuit analysis tools. Compaction is an example of a tool used to modify a layout in order to make it more suitable for manufacturing.
Designers often use a set of tools to design a chip from its RTL description to its layout implementation. Among these tools, one of the goals of the physical implementation tools is to optimize a chip up to its targeted functional frequency as specified by the designer while taking into account the physical data available from tools such as the placement and route tools. As electronic designs become larger, speeding up the physical implementation process runtime becomes a more important task.
Optimizing a design consists of modifying the database of the chip to meet the timing constraint specified by designers. The optimization engine identifies the most relevant timing paths to optimize and iterates over the instances along these timing paths. For each instance, it applies different actions to improve the slack on the critical path. Most usual known optimization actions are resizing, restructuring, buffering, and moving instances. These actions are normally computation intensive because the timing accuracy which relates to the timing graph, the RC extraction, routing estimation, etc. is usually required or mandatory.
Most of the existing computer systems deployed for physical implementation of an electronic design or optimizing tasks comprise single-core single or multiple central processing units (CPUs), and as a result, most of the existing physical implementation tools or optimizers or physical optimization schemes have been designed for such single-core systems. As a result, the heuristics and algorithms are more likely designed and tuned under the assumption that these physical implementation or physical optimization systems have single-core CPUs. Furthermore, the infrastructures used by these single-core systems, such as the database editing, timing engine, the placement or incremental placement tools, and the post-placement optimization tools are therefore usually not thread safe. This non-thread safe environment often makes the task of having a multi-thread optimization almost economically impossible.
One of the concerns is that the optimization process may be dynamic. That is, when a move optimization has been committed, the process may also modify some physical data and the next timing path to optimize and may be completely different from other optimization processes. In an ideal world, one solution may have several threads working in parallel on different independent parts of the design. This may be possible only when all underlying applications are thread-safe. That is, when two or more threads are configured to share the same region(s) of the physical memory, one thread is aware what other threads are doing to the same region(s) of physical memory.
In some cases, this parallel processing requires one thread to know whether or not another thread sharing the same region(s) of the physical memory is accessing the same region(s) of the physical memory, or particularly, whether another thread is writing to or modifying the content of the same region(s) of the physical memory. Nonetheless, making typical non-thread safe electronic design automation (EDA) implementation tool thread safe often requires rewriting various programs of the tool to some extent. This kind of effort to make a non-thread safe electronic design automation tool thread safe means to focus several experts for several years to rewrite the application and its dependencies (database, core timing engine, router, placer, etc. . . . ). That is, it may be difficult to implement parallelism on existing EDA tools because it often requires revamping part of the tools such as the core engine and because it also implicitly requires a thread-safe infrastructure which most, if not all such current tools do not have.
There exist two conventional approaches both of which retain a master-slave general architecture. The first approach consists of finding a smart partition of the problem and distributes independent tasks to several CPUs. This first approach usually implies the task execution to last a minimal amount of time. Each CPU performs a single well defined task on its assigned partition. For example, this type of solution is often used to speed up the net parasitic extraction process. This type of approach often requires that each CPU has to extract its set of net in the net parasitic extraction process.
The second approach also partitions the problem of interest, but the second approach distributes and populates the tasks onto a full database, where each slave works on a part of the database. This second approach is often used to perform some multi-mode multi-corner timing analyses. In a typical multi-mode multi-corner timing analysis, each mode/corner analysis may be performed on a single slave. Nonetheless, the drawback of the first approach for the optimization process is that it may be difficult to find balanced partition to be optimized independently in parallel. For the second approach, the main issue is the memory cost or the memory penalty, which may refer to the amount of memory required for each slave when the slave boots up or is initialized. Sometime, the second approach requires each slave to load the entire design into memory at the time each slave is initialized or boots up. With this second approach, it may not be economical or practical to run large designs on multi-core machines due to such memory cost or memory penalty.
Referring to FIG. 1 which illustrates an exemplary circuit design with a number of paths. Note that FIG. 1 is used solely for the purpose of illustration and ease of explanation and does not intend to limit the scope of any embodiments. It may be assume that there exist two critical paths in this design. In FIG. 1, the items such as I1, I2, . . . , I4, A, and B denote inputs. The items O1, O2, . . . , O4, and Y denote outputs. The items i1, i2, . . . , and i10 denote instances. The lines joining the instances, inputs, and outputs denote timing paths. The first critical path, P1, constitutes I1i1i3i4O1, and the second critical path, P2, constitutes I3i6i8i9O3. These two paths may be optimized in parallel. It shall be noted that in this example as shown in FIG. 1, the two critical paths P1 and P2 do not share any logic. Many optimization tools only work on one path at a time regardless of the number of cores an optimization tool may have. In some cases, this limitation of working on one path at a time is due to the non-thread safe characteristic of the optimization tools.
In cases where there exists a third critical path P3 which constitutes I2i2i3i5O2 where critical path P1 and critical path P3 share some logic, e.g., instance i2. In cases where there are two central processing units (CPUs) available, an optimization tool may assign paths P1 and P3 to the first CUP and a path P2 to the second core. In this example, it may be seen that the number of instances for each CPU may be unbalanced. That is, it may be seen that the assignment of P1 and P3 involves five instances to optimize, whereas the assignment of path P2 only involves three instances. Assuming each instance takes about the same amount of processing, it may be seen that the second CPU may complete its assigned tasks earlier than the first CPU due to the fewer number of instances assigned to the second CPU.
On the other hand, in some cases, optimizing the critical paths P1 and P3 on the first CPU may require less CPU resources than optimizing the single critical path P2 on the second CPU so the first CPU completes its assigned optimization tasks earlier and waits for the optimization on the second CPU to complete. In these cases, there still exist some unbalanced use of the computational resources so even though it may take less time in these case to optimize the three critical paths with two CPUs than it would take to optimize the same critical paths with a single CPU by assigning one critical path to the CPU at a time, the processing is nonetheless not optimized due to the existence of unbalanced loads on the CPUs. That is, this approach may be “improved” but not “optimized”.
In addition, there exist some cases where, for example, all three critical paths, P1, P2, and P3, all share some logic so the methodology described here assigns all three paths to one CPU due to the shared logic. The unbalanced workload then obviates the advantage of the multiple CPUs and uses only one CPU because all critical timing paths are assigned to the same CPU. It may be seen that for a more complex circuitry the unbalanced distribution of workload may be quite severe and that it may be difficult to predict how to assign paths to or how to partition the circuit or the full path for each CPU so as to achieve optimization.
Therefore, there exists a need for a method, system, and computer program product for parallelizing tasks in processing an electronic circuit design.