Computerized simulation techniques are used for analyzing and solving complex computational problems in various fields, such as verifying the performance of complex electronic hardware designs. Many simulation techniques are known in the art. Some techniques use parallel processing in order to reduce simulation time.
For example, PCT International Publication WO 2009/118731, whose disclosure is incorporated herein by reference, describes a method for design simulation that includes partitioning a verification task of a design into a first plurality of atomic Processing Elements (PE) having execution dependencies. The method further includes computing an order for executing the PEs on a multiprocessor device, which includes a second plurality of processors operating in parallel and schedules the PEs for execution by the processors according to a built-in scheduling policy. The order induces concurrent execution of the PEs by different ones of the processors without violating the execution dependencies, irrespective of the scheduling policy. The PEs are executed on the processors in accordance with the computed order and the scheduling policy, to produce a simulation result. A performance of the design is verified responsively to the simulation result.
As another example, PCT International Publication WO 2010/004474, whose disclosure is incorporated herein by reference, describes a computing method that includes accepting a definition of a computing task, which includes multiple atomic Processing Elements (PEs) having execution dependencies. The computing task is compiled for concurrent execution on a multiprocessor device, which includes multiple processors that are capable of executing a first number of the PEs simultaneously, by arranging the PEs, without violating the execution dependencies, in an invocation data structure including a second number of execution sequences that is greater than one but does not exceed the first number. The multiprocessor device is invoked to run software code that executes the execution sequences in parallel responsively to the invocation data structure, so as to produce a result of the computing task.