The present invention relates to the field of event simulation, and in particular to event-driven optimistic simulation where only a single rollback announcement per straggler message is required.
In distributed simulation, various techniques are used to describe, model and simulate a collection of physical subsystems and objects that comprise a physical system. A critical aspect of that simulation is understanding and representing the time- phased interaction of individual physical subsystems on each other within the physical system. Changes in the states of the physical subsystems result from these time-phased interactions. In distributed simulation, logical processes (LPs) are commonly used to characterize or mimic the time-phased interaction that occurs among physical subsystems in the physical system. This includes simulation by the LP of the output states of the associated or corresponding physical subsystems.
There are two general types of distributed simulation. One is typically called time-stepped, the other is typically called event-driven. In event-driven distributed simulation, the state of the physical subsystem results from interaction only at some special instant or event, and the happening of those events can only be determined through the simulation itself. Thus, the drive of the simulation is by means of these events, hence, the simulation being termed event-driven. The present invention is most concerned with the later type of event-driven simulation.
Typically, a sequential event-driven simulation employs a queue of events, in which events are arranged in sequential order of the time at which they must happen. At simulation time T, the simulator iterates on the next unprocessed event with the lowest or earliest time stamp. At each iteration, the simulator picks the earliest event in the queue that is schedule for the corresponding LP and the LP executes the event, thereby updating the state of the LP based on the event. As a result of these event executions and associated state changes by all LPs, new events are created which are inserted in the queue according to the time they must be executed.
As previously discussed, to accurately simulate the physical subsystem, events must be processed in a sequential order of time. The difficulty is that with a distributed simulation it is easily conceivable that an LP receives, for inclusion in its queue, an event scheduled for a time that has already passed in the local simulation. This late event has been termed a xe2x80x9cstragglerxe2x80x9d and creates a xe2x80x9ccausality error.xe2x80x9d With a causality error, a system will improperly affect the past. This is unacceptable and results in a flawed simulation.
With a distributed simulation of a complex physical system comprising physical subsystems that interact in non-sequential fashion, provision must be made to either avoid causality errors or accommodate and correct the errors. Multiple techniques are known in the art and used to avoid or accommodate stragglers. Conservative simulation, as known in the art, avoids stragglers thus ensuring that the current state will never need to roll back. As similarly known in the art of distributed simulation, optimistic simulation ensures that should a straggler message occur, the proper state is always recoverable.
In conservative simulations, a logical process is only allowed to advance in simulation time when there is no possibility of a straggler message. This constraint slows down the simulation, but there is no possibility of roll back and hence prior state information need not be saved.
One simulation method uses a concept called time warp, based on the Virtual Time paradigm. With time warp, time is somewhat distorted with the simulation advancing based on the interaction of logical processes rather than rigid simulation clock time. In this manner, a simulation advances rapidly when there are few interactions, and more slowly when there are many interactions.
In optimistic simulations, a logical process is allowed to act on events in their event queue until a straggler message is received and identified. On that occurrence, the logical process that received the straggler message typically broadcasts a rollback message to notify other logical processes of the straggler message and sends anti-messages to cancel improperly transmitted messages and stop erroneous computation. When straggler messages occur infrequently, optimistic simulation is very time efficient, allowing the distributed simulation to advance faster than it would under a conservative approach. As known in the field of optimistic simulation with time warp, a single straggler message can cause an avalanche of anti-messages as well as multiple roll backs of each LP. It is further possible, and known in the field, that the system may livelock (LP""s keep on rolling back each other and the anti-messages chase the messages in a never ending race).
The techniques known in the art that allow controlled roll back in optimistic simulation to accommodate stragglers, have high message overhead.
It is therefore an advantage and desire to modify the time warp to quickly stop the spread of erroneous computation without requiring output queues and anti- messages. This results in less memory overhead and simple memory management algorithms, with significantly reduced message traffic.
The present invention features a system and a method whereby a rollback announcement, in response to a straggler message in a distributed simulation, is broadcast only by the LP that receives the straggler message. Other LPs that roll back in response to the rollback announcement send neither a rollback announcement, nor anti-messages. As a result, each LP rolls back at most once in response to the straggler, and the problems of multiple roll back and livelock are avoided. This is accomplished through maintaining and appending transitive dependency information to each inter-LP message in the form of a simulation vector.
In an advantageous embodiment of the present invention, all LP""s maintain transitive dependency information. This transitive dependency information includes sufficient information to establish inter-LP dependencies, as well as information on past roll backs, in the form of an incarnation number. On receipt and identification of a straggler message, the receiving LP broadcasts a rollback announcement, which is acknowledged by all LPs. The rollback announcement includes the timestamp of the straggler message and the current incarnation number of the receiving LP. After receiving a rollback announcement, each LP can determine whether or not roll back is required by comparison of the time stamp and incarnation number (as appended to the rollback announcement), to the transitive dependency information (as maintained by each LP). If roll back is required, the LP uses the transitive dependency information to return to an earlier state, such that the straggler message will be properly executed.
Another aspect of the present invention is reduction of the required memory storage overhead needed to track and maintain transitive dependency. This aspect is accomplished through retention of only relevant transitive dependency information. This is advantageous because, unless a provision is made to control or limit the storage of transitive dependency information, the requirement for storage will continue to grow indefinitely as the simulation progresses. In the present invention, this aspect is accomplished by establishing a Global Virtual Time (GVT). GVT is known in the art as a given point in the virtual time of the simulation defined such that no state of any LP will ever be rolled back to a time less than the GVT. In this manner, transitive dependency information for states prior to GVT are no longer relevant to the simulation and thus may be discarded.
An alternate embodiment of the present invention provides for reduced process blocking as required to accommodate roll back. In this embodiment, the LP receiving the straggler message and broadcasting the rollback announcement blocks action on all further messages until all other LPs have acknowledged receipt of the rollback announcement. As a result, the simulation, as regards the LP that received the straggler, is blocked from advancement. In this alternate embodiment, the blocking occurs at the other LP""s rather than the LP receiving the straggler message . In this manner the receiving LP is able to recommence simulation immediately after accommodating the straggler message, without the need to wait for acknowledgment receipt of the rollback message from all other LPs.
An alternative embodiment of the present invention provides for reduced message overhead by grouping LPs into clusters and appending different transitive dependency information to inter-cluster messages based on whether the cluster contains optimistic or conservative simulation LPs.
These and other advantages of the present invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.