1. Field of the Invention
The present invention relates to data preparation and manipulation for integrated circuit graphical design. More particularly, the inventions disclosed and claimed herein relate to methods and apparatus for automatically and intelligently partitioning data comprising integrated circuit (IC) graphical designs for distribution and processing by multiple network computer resources as separate tasks defined by the partitioning with minimal partition-related errors, faster throughput of aggregate tasks and more efficient use of parallel compute resources (improved scalability).
2. Description of the Related Art
The characteristics of today's complex IC designs have made accelerated processing of graphical design data an essential part of design-to-silicon processes. To achieve profitability, design houses and fabs alike must be capable of processing huge and complicated volumes of design data swiftly. As IC technology continues to miniaturize, support hardware and application programs required to reliably print the minimum feature sizes on silicon tends to lag behind, further widening sub-wavelength gaps. For physical verification, or real checking of very large scale IC designs, non-equipment based technologies like phase-shifted masks (PSM), optical proximity checking (OPC) and optical rules checking (OPC) are required to process sub-wavelength data ever more quickly, efficiently and accurately. That is, with respect to sub-wavelength technologies, resolution enhancement techniques and the corresponding increase in the numbers of processing operations, and corresponding processing times for the sub-wavelength chip geometries. The operations must be carried out to move the design data through the stages of the manufacturing cycle. To that end, various partitioning schemes and processes have developed to accommodate these ever-increasing processing demands.
The skilled artisan should fully understand that partitioning a graphical design to facilitate parallel processing might itself generate errors and processing bottlenecks. The cutting or partitioning of a chip design may affect the ability of the presently available verification tools and platforms to process the separate tasks in an efficient timely manner. Ineffective or inefficient use of network compute resources inherently adds time to overall verification, and cost money. Known automatic partitioning of an IC graphical designs for distributed processing reflect increasing processing overhead, particularly at sub-wavelength dimensions are “inefficient.” That is, known partitioning techniques do not partition the design to facilitate scalability in a distributed processing environment (scalability of a task in a multiple cpu network), and therefore process inefficiently. Limitations arising from communication overhead required to process conventionally partition real check tasks throughout available network resources results in the underutilization, and limited scalability of the same distributed network available compute resources.
For example, U.S. Pat. No. 7,051,307 (the '307 patent), commonly-known and incorporated by reference herein, discloses a process for automatic graphical partitioning of IC graphical design data to better facilitate post-partition processing. The '307 patent processes analyze the hierarchy, and graphical nature of a circuit design to define the most appropriate locations and sizes of windows (or partitions), adapting the partitioning to the inherent character of the IC design (physical design). For example, the '307 patent suggests that it is preferable to partition in such a way that an entire macro defines partition margins in the design. If the partitions are too large, or too small, distributed processing may not improve overall processing times, nor effectively utilize system compute resources.
FIG. 1 shows a schematic flow diagram of a known master process for hierarchical partitioning of graphical design data for electronic design automation, or EDA-type applications. Block 100 of FIG. 1 represents the master AGP process start step, and block 110 represents a step where the IC graphical design data are processed for initial validation of the proposed circuit design. Block 120 represents a step of partitioning the design data, and block 130 represents a step wherein tasks are “built” for processing the partitioned data by the resources available. Block 140 represents a step of task submission across the network of machines or cpus, and block 160 represents a step where the process loops until all the dispersed or distributed tasks are completed. Once the correct size and number of logical blocks is found (for example, by the step of block 130), the process eliminates duplicate logical blocks and overlapping logical blocks that overlap above an “overlap percentage limit.” The step partitions the proposed block structure when the rules are met.
After submitting the tasks across the network of allocated cpus (block 140), the designated cpus run separate physical verifications of design features (e.g., design rule checking (DRC), optical rule checking (ORC), optical proximity checking (OPC), etc.) inherent in the partitioned data. OPC by its nature restructures and removes a great deal of the hierarchy, and being context dependent, flattens the data to some extent. But even with the best hierarchy management techniques, file sizes, data types and data volumes typically grow non-linearly, increasing data processing requirements. With such large amounts of data for processing, the increased use of parallel or distributed processing of partitioned design data increases efficiency in processing very large numbers of DRC-like operations (e.g., Boolean operations, width and space measurements involving design layout shapes), and advanced resolution enhancement techniques, such as optical proximity correction (OPC), scattering bar generation, etc.
Block 170 (of FIG. 1) depicts a step in the '307 patent process wherein the overall post-processing results are assessed (automatically). The success of same processing tasks may be determined in the step represented by block 180. If the tasks were all successful, the user is notified as per the step of block 185, and the process stops (block 195). But if all tasks did not run successfully, the process resubmits the incomplete tasks for further processing, as can be seen by the step of block 180. But even with AGP-like partitioning, improved scalability and reduction in overall physical verification processing times is not a given. For example, ORC-like operations do not scale well beyond several dozen cpus in a distributed processing scheme, depending on the application or platform managing the processing, and some may not scale well beyond 3 or 6 cpus.
Improved scalability of data prep operations in a distributed processing network environment, to improve throughput time and fully scale to available network cpus would improve application to such processing tasks by known tools. That is, with emerging technologies, e.g., 45 nm technologies, will be so large, and the operations so complex, that a distributed processing system capable of processing such tasks in a timely manner is expected to require scalability on the order of 1000 cpus, to “contain” aggregate real check run times. Today, DRC-like operations are processed using multi-threaded approaches, which inherently do not scale well. Available vendor tools tend to not efficiently utilize parallel compute resources by, for example, designating a cpu or machine for a particular task based on the task (partitioned data to be processed), and the machine or cpu ability. For that matter, scaling, or scalability, is a metric used to provide some indication as to how well an EDA process or application utilizes available compute resources. DRC tape-out flow is limited by data translation time, run time, debug time, etc. DRC, OPC and CRC cycles are iterated many times while the designers check, fix and recheck the design during tape-out flow. Only after full chip assembly can the final verification begin.
BRION, Inc., manufactures an integrated hardware/software platform that makes extensive use of hardware accelerators to expedite DRC-like operations to improve processing efficiency for performing large numbers of DRC-like operations. The BRION platform, however, is very expensive. SYNOPSIS, Inc., and MENTOR GRAPHICS, Inc., provide software that scales to large node counts to distribute data for DRC-like processing, and resolution enhancement processing, across a network's allocated compute resources (parallel processing). The MENTOR Graphics software, however, does not scale well for the anticipated large node counts needed to process shrinking technology, e.g., 45 nm node technologies. And while the SYNOPSIS software may scale better then the MENTOR GRPHICS technologies, the SYNOPSIS application is not arranged to “efficiently” perform resolution enhancement operations, such as optical rule checking (ORC), optical proximity correction (OPC), etc.
Multithreading and distributed processing are parallel computing approaches which attempt to utilize parallel compute resources, and a main memory services or resources, to render a real checking more efficient. Multithreading works well only for “small” tasks, because memory contention ensures that these systems lose their scalability and speed limits with the number of cpus in the compute resources exceeds about 10 or 12 cpus. Synopsis, for example, utilizes distributed processing and hybrid OPC in an effort to realize improved scalability, to manage the hierarchy over a network of cpus rather than one, or four (4). The Synopsys tools partition the design into tasks, and distribute the tasks in pieces to individual compute resources for processing, and the processed pieces are returned and patched together. But as mentioned, conventional arbitrary partitioning of a design into smaller regions (for easier processing by a cpu) runs the risk of cutting through shapes (which might correspond to a macro). Cut shapes raise processing errors in various ways, including margin errors where partition margins or boundaries complicate processing by correction algorithms. The processes encounter problems or increased communication overhead where the size of arbitrarily partitioned shapes falls below the minimum size the applied algorithm was designed to accommodate.
The skilled IC graphical designer would welcome a vendor tool or platform that can automatically partition input data with an eye to distributed processing to significantly reduce overall processing times for ORC and/or OPC-like operations upon the partitioned data by effectively and efficiently using all available in-network cpus or compute resources.