Nowadays it may happen that several different network output devices, e.g. mobile phones, in the same neighborhood (e.g. a room) play the same audio broadcast. When this occurs, very annoying artifacts, such as an echo effect for example, are produced.
A typical application is push-to-talk. Push-to-talk is a talkie-walkie like service for mobile phones. Push-to-talk is half duplex and one-to-many. When a push-to-talk burst is being transmitted by a mobile phone user, it may occur that several receivers are actually located in the same room. When this occurs, if nothing is done to prevent it, each receiver will play the sound burst separately and a very disturbing multiple echo effect will be generated. This is due to the fact that the human hear is very sensitive to delays between audio signals originating from the same source. Delays produced can range from a few milliseconds, resulting in a light echo effect, to several tens of milliseconds, which becomes rather annoying, to several hundred of milliseconds, which is extremely disturbing. The delay duration depends on several parameters, e.g.:                how the receiving device implements the service: for example the Open Mobile Alliance Push-to-talk Over Cellular (OMA POC) service specification does not specify the end-to-end delay so that each implementer will do its best to shorten the delay, i.e. to implement the sending function and the receiving function in order to control the processing time between the capture of audio and the sending of RTP (for “real-time transport protocol”) packets on one hand, and the time between the reception of RTP packet and the restitution of sound on the loudspeaker on the other hand;        how the network is implemented: the number of “hops”, the time of flight of packet which is due to routing, and ultimately the distance.        
Push-to-talk services always involve at least one server which functions is to relay and broadcast (i.e. the server does the one-to-many replication of media packets). For service and billing reasons there will typically be at least one server per operator, so a push-to-talk path between people subscribed to different operators may exhibit larger delays. For example Alice and Bob are in the same room in England, having a push-to-talk session with Charlie, who is away. When Charlie talks, his voice is routed to Alice through Germany because Alice has a subscription with a German operator, while Charlie's voice is routed to Bob through Japan because Bob has a subscription with a Japanese operator. In such an example, the delay can be very large (e.g. a few seconds) and thus very disturbing.
The US patent application n° 2003/0198257 A1 discloses a method of manually synchronizing the playback of a digital audio broadcast on a plurality of network output devices. The method is applicable for use with methods such as those that use a time code, insert a control track pulse, or use an audio waveform sample for synchronization. The manual adjustment method relies on a graphical user interface for adjustment and audible pulses from the devices which are to be synchronized. The digital audio broadcast from multiple receivers does not present to a listener any audible delay or echo effect.
Similarly, the paper entitled “Flow Synchronization Protocol” by Julio Escobar, Craig Partridge, and Debra Deutsch, IEEE/IACM Transactions on Networking, vol. 2. no. 2. APRIL 1994, discloses an adaptive flow synchronization protocol that permits synchronized delivery of data to and from geographically distributed sites. Applications include inter-stream synchronization, synchronized delivery of information in a multi-site conference, and synchronization for concurrency control in distributed computations. In this case, the playback across the network output devices is synchronized by buffering data in the faster output network devices for compensating for the delay.