In the case where a number of users on a network wish to share data such as graphical objects in a virtual reality scene and to be able to communicate changes in those objects to processes running at each of the nodes, there is a need for a fast and reliable updating system so that each user may quickly and reliably know what changes one user wishes to transmit. For instance, assuming there is a group of asynchronous processes interacting via the shared data-object model of distributed shared memory or some similar sharing model where the processes share objects; and further assuming that this group is possibly communicating over a network, possibly geographically separated and possibly participating in a distributed virtual environment, then the goal is to simultaneously achieve: first, rapid interaction to maximize the speed of communication of object changes in order to achieve near real-time interaction; second, low bandwidth to minimize the communication bandwidth used; third, reliability to guarantee that object changes are eventually, if perhaps sometimes slowly, successfully communicated between the processes; and fourth rapid joining to allow a new user to join a communication group and rapidly become up to date on what all the other processes know.
In the following description, assume that the data to be interactively worked on resides in a world model which is used to describe the set of all objects being shared at any given moment.
When considering how to achieve the above goal, it is important to consider the following spectrum of ways that objects can change. A communications solution must work acceptably at all points of this spectrum and should work particularly well at whatever points are most likely in a particular application.
At one extreme, some objects change very frequently, e.g., tens of times a second or more. For many applications, it matters very little to recover a particular lost message describing a change, because it is not liable to be possible to do so before the change is rendered obsolete. Rather, the focus should be on always being able to utilize the latest information as soon as it arrives. In addition, one should use as few resources as possible on useless repair attempts. Using application specific knowledge to determine that some lost messages are not worth repairing, because they have been obsoleted by subsequent changes, is central to the strategy referred to as object-based repair.
At the other extreme, some objects change very infrequently, e.g., only once every few minutes or hours. In this situation, the exact moment a change occurs may or may not matter, but the fact of the change certainly matters. It is very important that each individual change be communicated. It is also important that if information is lost about a particular change, this is detected long before the next change occurs. In this situation, some sort of positive acknowledgment scheme is needed to detect lost messages.
In the middle are objects that change at moderate speed, e.g., once every few seconds or so. Here repair is important and must be relatively timely. This is a particularly difficult part of the spectrum to support well. Fortunately, it is plausible that many applications make use of the two ends of the spectrum more than the middle.
It should be realized that a single object may be changed rapidly for a while and then change slowly or not at all for a while. Therefore, a general purpose approach cannot rely on knowing in advance which objects will exhibit which kind of behavior. Rather it must adjust dynamically to whatever is happening.
As to Distributed Database and Shared Memory Technology, one way to approach the goal above is to use standard distributed database or shared memory technology. In these approaches the paramount goal held up above all others is insuring that at any moment when two processes access a given shared object, the two processes will always obtain the same values. To satisfy this goal, locks must be used to prevent processes from accessing objects at the wrong time.
For example, suppose that process Pi wants to modify object A. To do this P1 must:
1. check that no other process has locked A, waiting if necessary until the lock is free, PA1 2. lock A so that other processes are prevented from accessing A, PA1 3. send messages to all the other processes in the group notifying them that the lock is set, PA1 4. wait until it receives return messages from all other processes acknowledging the lock. Note, this may result in discovering that some other process took the lock first, in which case P1 must return to step 1 above. PA1 5. make the desired change in A, PA1 6. send messages to all other processes specifying the change, PA1 7. wait until it receives return messages from all other processes acknowledging receipt of the change messages, PA1 8. remove the lock on A, PA1 9. send messages to all other processes saying that the lock is removed.
This handshaking wastes bandwidth and dramatically slows interaction. Setting and freeing the lock on A requires multiple messages to be sent between P1 and the other processes in the group. The back and forth communication greatly increases the latency interval between the time P1 decides to change A and the earliest time at which any other process can access the change. Each message must be sent completely reliably which further increases bandwidth usage and latency.
Finally, the latency rises rapidly as the number of processes in a group rises. As a result, standard distributed database or shared memory approaches cannot be used for the near real-time interaction of more than a handful of processes.
To achieve near real-time interaction between even a moderate number of processes, one must abandon the otherwise desirable requirement that when two processes access a given shared object, the two processes will always obtain the same values. Rather, one must dispense with inter-process locking and allow temporary disagreements between processes about the values associated with an object. In particular, when a process P1 modifies an object A, there will be a short period of time before another process P2 finds out about this change and during that time the values obtained by P1 and P2 when they access A will differ.
It is convenient to also assume that each object has an owning process and only that process can modify the object. This avoids writers/writers problems and means that there does not have to be any means of arbitrating between simultaneous changes. If an application wants to have several processes that can alter a given object, then the ownership of the object can be transferred from one process to another. Alternatively, a single process can be appointed as arbiter of change requests for the object and be the process that actually makes the changes based on these requests. This essentially mimics exactly what would have to be happening if multiple processes were to directly modify the object, because there would in that case have to be some arbitration method. For purposes of discussion, what follows assumes that at any given moment each object has only one process that can alter it.
Given a relaxed equality constraint, several approaches have been used to attempt to meet the goal above: central server systems, Distributed Interactive Simulation, DIS, and reliable multicast.
Central server approaches have each process in a group communicate the changes it makes to a central server, which then notifies the other processes. This approach does a good job of keeping the information known by the processes as close to the same as possible.
It also does a good job of allowing rapid joining, because a new process can receive a rapid download from the central server of everything it is supposed to know. In addition, by sending the messages to and from the server using a reliable protocol such as TCP, the central server approach can easily guarantee reliable delivery of information.
However, the central server approach has two problems. First, interaction speed is significantly limited, because all messages have to go first to the central server and then to the other processes in the group. In comparison to sending messages directly from one process to another, this adds an additional message flight time and adds the time required for the server to interpret the incoming message, decide what to do with it, and generate an outgoing message.
Second, bandwidth needs are increased somewhat due to the need to send messages to the central server as well as to the other processes in the group.
Systems conforming to the Distributed Interactive Simulation standard, DIS, Standard for Information Technology, Protocols for Distributed Interactive Simulation, DIS ANSI/IEEE standard 1278-1993, American National Standards Institute, 1993, send messages about object changes directly from one process to another using what is effectively multicast messages using the UDP protocol. Actually, early DIS systems use broadcast in dedicated subnetworks with special bridging hardware/software to forward messages from one subnetwork to another, but this is essentially what multicast capable network routers do.
The key virtue of the DIS approach is that it communicates information between processes at the maximum possible speed. In addition, multicast uses significantly less system bandwidth than multiple point to point connections. However, there is no guarantee of delivery of UDP messages. Therefore, DIS does not guarantee that a change made by one process will ever be known by a given other process.
To counteract the reliability problem, DIS takes two actions. First, each message sent contains full information about an object so that it can always be understood even if previous messages about the object have been lost. Second, DIS systems send out frequent `keep-alive` messages specifying the current state of each object, typically once every 5 seconds. This means that lost information is typically repaired within 5-10 seconds. It also means that a new process will be informed of everything it needs to know in 5-10 seconds.
The above notwithstanding, DIS is still left with four significant problems. First, the fact that differential messages cannot be used, and therefore each message describes an object fully, wastes a lot of bandwidth, because even when only a small part of an object is changing, a description of the whole object is continually being sent.
Second, the keep-alive messages waste a lot of bandwidth, because when an object is not changing at all, repeated messages are still sent describing the whole object.
Third, while keep-alive messages cause eventual repair, they do not cause fast repair. Therefore, the processes in the group can get significantly out of synchronization in what they believe about the data they share and near real-time interaction is impaired.
Fourth, joining is not rapid, because it takes 5-10 seconds for a new process to learn what the other processes know.
A clever part of DIS is that there is no central server process at all, and no need for any process to figure out what information other processes have received. Rather, all processes just forge ahead in ignorance of the others. When few messages are lost, things work extremely well, albeit at the cost of significant additional bandwidth. When a significant number of messages are lost, things continue to work out with no increase in bandwidth usage, albeit with a reduction in real-time interaction.
A final piece of related prior work is research on reliable multicast protocols. In that work, the primary goal is to achieve low bandwidth operation using multicast messages, but to incorporate handshaking that ensures reliability. There are two basic ways to do this: with acknowledgment messages, ACKs, or negative acknowledgment messages, NAKs.
In ACK-based approaches, each recipient sends explicit ACKs of the receipt of the messages sent to it. As in protocols such as TCP, this allows the sender to know exactly what has to be resent and to whom. However, the problem with this is what is referred to as an "ACK explosion".
Suppose that a process P is sending messages to N other processes. Each time P sends a message, N ACKs are generated. This uses significant bandwidth and causes P to receive N messages that it has to deal with for each message it sends out. Note that in the group as a whole, there are N times as many ACK messages as data carrying messages. As a result, the ACK messages soon come to dominate all communication as the group grows large. If the ACKs are themselves sent by multicast, then all the processes have to deal with all the ACKs. If the ACKs are send directly from the various processes back to the sending processes, then this means that on the order of N-squared 1-to-1 channels are open and the bandwidth needed for communicating ACKs is increased.
In NAK-based approaches, control messages are sent only when messages are lost. Specifically, when a process P2 notices that it has failed to receive a message M from another process P, it sends a NAK requesting that the message be resent. The advantage of this approach is that when messages are received, bandwidth is not wasted sending ACKs. However, there are still significant problems.
First, the primary way for P2 to tell that it has missed M is for it to receive a different message sent by P after M. In comparison to using ACKs, this delays the time at which the loss of M can be detected and therefore repaired. This problem is particularly severe if P does not send any message after M. In that case, P2 might never notice that M was lost. To counteract this problem, some kind of message must be sent that specifies what processes should have received. A pure NAK-based approach is only possible when each process sends a steady stream of messages.
Secondly, as with ACKs, if NAKs are themselves sent by multicast, then all the processes have to receive all the NAKs. If the NAKs are send directly from the various processes back to the sending processes, then this means that on the order of N-squared 1-to-1 channels are open and the bandwidth needed for communicating NAKs is increased. In either case, the N NAKs that converge on the sender when a message is entirely lost is referred to as a "NAK implosion". The existence of this traffic causes difficulty at the sender that can further impede communication beyond whatever problem caused the communication to fail in the first place.
From this perspective, reliable multicast protocols have several key problems. First, most of them do not even attempt to support near real-time interaction or rapid joining, focusing instead on reliability, and low bandwidth.
Second, many of them expend significant resources ensuring reliability features such as order of arrival that are not useful for solving the problem posed above.
Third, if ACKs are used, this uses a significant amount of bandwidth, even when few messages are being lost. If a significant number of messages are being lost, then bandwidth usage goes up further due to the need to resend messages that are lost. If NAKs are used, then bandwidth usage is much lower when things go well, but ramps up much more steeply as messages are lost, due to the need to begin sending many NAKs in addition to resending messages.
In both cases, the basic behavior of requiring more bandwidth when messages are being lost is unfortunate since bandwidth limitations are a prime reason why messages get lost. Particularly in NAK-based approaches, this can cause a negative spiral where the initial onset of problems causes more problems.
Fourth, and perhaps worst, pushing directly for reliability at the low level of multicast messages themselves does not strike at the heart of the problem posed above. For example, suppose that process P1 changes object A at time T1 and sends a message M describing this change. Suppose in a NAK-based approach that at some later time T2, a process P2 discovers that it has not received M. P2 then sends a NAK requesting the retransmission of M. This is all well and good, but what P2 really wants to get is not M, but what the state of A is at T2. That is to say, the reliability that is desired is not necessarily the receipt of every message, but rather getting at all times the most up-to-date information about A possible.