System simulation has been a very important "tool" in the design and the prediction of the performance of real systems. During the design phase, system simulation provides the means to study design tradeoffs and identify performance bottlenecks--thereby shaping the architecture and top-level design of the real system. During the post-design time frame, simulation serves to tune system performance through optimization of system configuration parameters and to identify potential design improvements. It, also, serves to generate performance predictions for new system applications prior to their implementation. The need for system simulation becomes imperative for complex systems where the risks of designing the "wrong" system become enormous. Large-scale computer networks (referred to, also, simply as networks) represent such complex systems and are today, one of the major applications of simulation. At the same time, the emergence of inexpensive computing power makes system simulations affordable.
Simulations can be classified into three types: continuous time, discrete time, and discrete event. The specific type of simulation described herein within the context of the present invention is known as discrete event simulation. That is, simulations where system state changes occur, in response to applied stimuli, at discrete points in time. Furthermore, the focus of discrete event simulation pertains particularly to distributed simulations in contrast to unistations/uniprocessor simulations. In the context of simulations, the term "unistation/uniprocessor" refers to the simulation environment associated with a single computer or work station, or a single processor of a multiprocessing system. In the foregoing description, such references pertain to work stations. Such descriptions regarding the latter applies to processors of a multiprocessing system as well. It should be understood that distributed simulations are simulation environments where the system at hand is partitioned into component submodels allocated to different computer workstations for execution. Distributed simulation has been studied extensively by industry and academia because of its promise to make possible (due to multiplication of computer resources) and/or practical (through faster execution) the simulation of large-scale systems.
Generally, distributed simulation necessitates that the cooperating workstations executing system submodels exchange time stamped event information; indicating at a minimum the next event each workstation is to execute as well as the scheduled time for the execution. Consequently, a workstation proceeds with execution of its next event when the latter becomes the smallest time stamp event in the collective list of pending events. This process prevents causality errors; that is, situations where execution of an event modifies state variables used by an event scheduled for execution earlier. However, distributed simulation using strict sequential order yields no gains in execution speed and is, therefore, not viable. Ideally, one would like the individual workstations to execute events in parallel in order to maximize the speedup factor. To this end, the prior art has been deploying two types of models for distributed simulation: (a) "optimistic" models, and (b) "conservative" models. Exemplary optimistic and conservative models are described by Fujimoto R. M. in "Parallel Discrete Event Simulation"; Communications of the ACM, October, 1990, incorporated herein by reference.
Optimistic models do not attempt to sequence the execution of events processed by different processors. Instead, such models allow each workstation to execute its own event sequence assuming independence among the events processed by the different workstations. At the same time, these models implement mechanisms for detection of causality errors and subsequent recovery through rollback.
Conservative models, on the other hand, are based on complete avoidance of causality errors by implementing lookahead algorithms that identify interdependencies among events executed by different workstations. Thus, such models allow processing of an event by a workstation, only when it is determined that the event in question will not be affected by the results of events that are currently being processed or are to be processed next by the rest of the workstations.
Most notable among the shortcomings of the optimistic approach are the processing overheads inherent in the model. Such overheads include: (a) those associated with the periodic saving of the state of each process (in order to make possible recovery of the simulation run when a causality error is detected), (b) the time wasted in incorrect processing while heading for a causality error, and (c) the time required to rollback, undo the "thus far event processing", and reprocess events. Such overheads may nullify parallel processing benefits or lead to mediocre speedup gains. Other shortcomings include the fact that the required detection and rollback mechanisms are very complex and difficult to implement, as well as the fact that "erroneous processing" (while heading for a causality error) may lead to infinite loops.
Conservative models, while not incurring the aforementioned overheads and instabilities, do, however, require efficient lookahead algorithms that identify and exploit event parallelism in order to achieve good performance. This is a serious problem considering that: (a) many applications do not allow the development of such efficient algorithms, and (b) even when they do, such applications may be highly dependent on the "constants" of the specific simulation experiment. Another problem concerns the fact that users need to possess detailed knowledge of the deployed event synchronization scheme, so that they can "tune" the model for the particular application/experiment being simulated.
Another shortcoming, common to both classes of models, is their reliance on special (as opposed to commercial off-the-shelf) software developed for specialized research or applications programs.
Finally, it is noted that in the case of distributed simulation of large-scale communication networks, existing models are forced to limit themselves to specific configurations and to incorporate simplifying assumptions in order to make practical the simulation of such large networks.