It is becoming increasingly common to establish video conferencing sessions over IP networks rather than circuit-switched networks, such as ISDN. Such networks can, for example, be LANs, WANs, or virtual networks established over the Internet. In a typical session, a TCP/IP virtual connection is established between a pair of video endpoints, which can then communicate with each other to provide a telecollaboration session. The endpoints stream video and audio data to each other over other virtual connections (e.g. using RTP).
Video data is streamed over a network in compressed form and comprises two kinds of frames: P-frames and I-frames. P-frames are smaller in size than I-frames because the P-frames only contain information about the changes relative to a previous frame. For example, if an object moves over a static background, the P-frames only carry information pertaining to the movement of the object. On the other hand, when there is a change of scene, it is necessary to transmit the entire frame, and this is achieved with an I-frame. Because small data errors in P-frames can result in disproportionate degradation of received video, I-frames are also transmitted periodically to limit perpetuation of these data errors. Although the I-frame may be compressed internally, it is still much larger than a P-frame.
When multiple Video sources are streamed onto an IP network, I-frames occurring simultaneously create bandwidth or traffic peaks. As a result of the network internal congestion controls, which discard packets when congestion exceeds a certain threshold, the important I-frames may be discarded en route. This problem can occur when multiple video conference calls are in process and particularly in the case of multi-party conferences when the same video source is connected to two or more remote endpoints.
Existing stream buffers attempt overcome this problem by indiscriminately delaying arbitrary packets. This technique can result in undesirable latency in the video conference case. Another solution can be achieved at the endpoints if the users accept lower quality video, e.g. lower resolution and/or lower frame rate will be exchanged for more consistent, reliable performance.