Massively parallel processing (MPP) environments are computer environments that operate using a massive number of processors. It is typical for an MPP environment to use tens of thousands of processors. Each processor in such an environment is able to execute computer instructions at the same time which results in a very powerful system since many calculations take place simultaneously. Such an environment is useful for a wide variety of purposes. One such purpose is for the software simulation of a hardware design.
Large logic simulations are frequently executed on parallel or massively parallel computing systems. For example, parallel computing systems may be specifically designed parallel processing systems or a collection, referred to as a “farm,” of connected general purpose processing systems. FIG. 1 shows a block diagram of a typical parallel computing system (100) used to simulate an HDL logic design. Multiple processor arrays (112a, 112b, 112n) are available to simulate the HDL logic design. A host computer (116), with associated data store (117), controls a simulation of the logic design that executes on one or more of the processor arrays (112a, 112b, 112n) through an interconnect switch (118). The processor arrays (112a, 112b, 112n) may be a collection of processing elements or multiple general purpose processors. The interconnect switch (118) may be a specifically designed interconnect or a general purpose communication system, for example, an Ethernet network.
A general purpose computer (120) with a human interface (122), such as a graphical user interface (GUI) or a command line interface, together with the host computer (116) support common functions of a simulation environment. These functions typically include an interactive display, modification of the simulation state, setting of execution breakpoints based on simulation times and states, use of test vectors files and trace files, use of HDL modules that execute on the host computer and are called from the processor arrays, check pointing and restoration of running simulations, the partitioning of a logic design, and single execution of a clock cycle.
The software simulation of a hardware logic design involves using a computer program to cause a computer system to behave in a manner that is analogous to the behavior of a physical hardware device. Software simulation of a hardware logic design is particularly beneficial because the actual manufacturing of a hardware device can be expensive. Software simulation allows the user to determine the efficacy of a hardware design. Software simulation of a hardware logic design is well-suited for use in an MPP environment because hardware normally performs many activities simultaneously.
In an MPP environment, an individual logic design modeling a physical hardware device can be simulated on a potentially large number of parallel processing arrays. Before the logic design is able to execute, the design is partitioned into many small parts, one part per processor array.
Code partitioning in a compiler typically uses one of two classes of partitioning algorithms: (1) critical path scheduling, and (2) multi-level k-way partitioning (MLKP). Critical path scheduling algorithms place the largest critical paths first, and the shortest critical paths last. In this way, large paths get scheduled first, followed by other paths in decreasing critical path length. Critical path algorithms generally do not consider or do not model the communication overhead between the processors, when scheduling paths across processors. MLKP algorithms are based on the observation that bisection algorithms are able to optimize a small set of nodes. Therefore, the input graph is “collapsed” into a smaller graph that is then partitioned.
Once code is partitioned, each part is scheduled for a corresponding processor array or multiple processor arrays and routed to execute on a simulation system. Scheduling involves both timing and resource availability issues of the processor array executing a node (i.e., a gate or a HDL statement).
A partitioning solution should obtain the minimum runtime of the logic design. According to current schemes, two criteria are used to measure the quality of a partitioning solution: (1) the degree of parallelism of the parts in the partition, and (2) the amount of inter-processor communication. The degree of parallelism is the number of parts in a partition that can be executed simultaneously. The degree of parallelism alone, however, is not enough to guarantee a fast overall simulation time of the circuit because communication cost limits the contribution of parallelism to the overall simulation time. The inter-processor communication results in a communication cost (sometimes referred to as overhead) between the processor arrays. The ratio of computation time and communication time is used as a quantitative measure, i.e., the time the processor array spends on computation over the time the processor array spends on communication.