This invention is generally related to multimedia communications over a packet switched network, and more particularly to techniques of establishing optimal audio latency in streaming applications such as conferencing.
In many network-based multimedia applications, data is sent from a first point to a second point in packets. The packets are decoded when received at the second point and played back on playback hardware. For example, in the case of streaming audio or video multicasting, a transmit process sends to a number of destinations over a network the packets of data that make up the desired audio or video, where the packets are received by a receive process, decoded, and played back. In another example, in a two -or multi-party video or audio conference, the parties send back and forth data packets over a network which are then decoded at the receiving parties and played back. Such streaming applications are especially popular over the Internet and corporate intranets.
A difficulty with such applications, however, is that the transmittal and receipt of the data packets through the network may become hampered. Packets may be held up, or lost, for example, throughout the network. Also, the bandwidth at which a given party can receive or send packets may be limited. The practical effect of these problems is that the parties may find their audio and/or video streams broken upxe2x80x94that is, the resulting multimedia stream when played may be choppy, such that the perceptual quality of the playback may be degraded.
A limited solution to this problem is to introduce a predetermined fixed latency in the data stream, via a buffer or other mechanism. In this way, if packets are held up in the network and are expected to be delayed by an average of, for instance, one second, playback at the receiving end is not affected if a one second buffer is used. This solution has some success in one-way multicast situations, where a source is transmitting data for playback on multiple destinations. This is because the destinations only receive data packets, and are not expected to send responsive data packets to the source or the other destinations. The initial one second delay required to fill the buffer with packets can be tolerated at the beginning of the transmission.
However, in other situations, introduction of a fixed latency is less than optimal. For example, in the case of audio or video conferences over a network, where communication is two-way among the parties of the conference, introduction of a fixed latency may adversely affect the quality of the conference if the latency is too large. The parties may find, for example, that their ability to respond in a timely manner to the others in the conferencing session is hampered. That is, because the parties normally assume that the conference is occurring in real-time, similar to an in-person conversation, when in reality the conference is being buffered, it may be difficult for the parties to interrupt one another in a manner that resembles a normal, in-person conversation. For instance, with a one second buffer at each end of a conference, the parties will experience a round-trip delay of at least two seconds in receiving a response from another party in the conference. Such a lengthy delay can easily disrupt the normal conversation between two persons.
An embodiment of the invention is directed to a method of determining whether an elapsed time between arrival of first and second packets, the packets being parts of a stream of audio sent by a transmit process through a network and received by a receive process, is primarily a network delay due to the second packet being slowed while traveling through the network, or primarily a transmit delay due to a delay by the transmit process in sending the second packet. The method includes the elapsed time as part of interpacket delay statistics only if it is determined that the elapsed time is due to the network delay. The size of a packet queue is adjusted based on the interpacket delay statistics.