A mobile video collaboration system is disclosed in U.S. Pat. No. 7,221,386 (Thacher et al) issued May 22, 2007 which utilizes wired or wireless digital networks for transmitting, in real time, digital video, audio, pictures and other data between a handheld collaboration appliance endpoint and a video conferencing endpoint, the disclosure of which is incorporated herein by reference.
Decoding and streaming encoded video and audio on a mobile video collaboration system presents challenges in being able to provide the end users, either local or remote, a natural experience when rendering video and audio. Two major resources that can greatly affect system performance include network bandwidth as well as the available CPU cycles within the apparatus itself.
The unreliable nature of some digital networks, cellular 3G networks for example, can result in the loss or delay of the digital content being transmitted between the collaboration appliance endpoint and the conferencing endpoint. The delay or loss of content will result in reduced available bandwidth for streaming video and audio and thus result in degraded rendering at the receiving end.
Typical rate control algorithms in packet switched networks use a closed loop reporting system that allows both the sender and receiver to exchange information about the current transfer characteristics. This includes information such as packet loss, packet sequencing and packet jitter. This information is then used to determine which packets to discard when network bandwidth is limited. It is also used to determine whether or not the current network conditions are improving or degrading in order to increase or decrease the traffic flow from the transmitter to the receiver. The problem with the current algorithms is that they are agnostic to the type of data being transferred. A compressed video frame can be contained within a single packet or segmented into multiple packets if its size (in bytes) is greater than the maximum transmission unit (MTU) of the network. The decision to drop a packet within a compressed video frame is essential in that dropping some types of frames will cause significant degradation when rendering. With compressed video, different types of frames exist that make up a group of pictures. A group of pictures begins with an intra-frame or I-frame, and are followed by zero or more inter-frames (e.g. predicted frames or P-frames). An I-frame is a key frame that contains compressed video data that does not require a reference to any other frame and therefore can be decoded in and of itself. Inter-frames however, are difference frames that likely reference previous frames and cannot be properly decoded in and of itself. If a frame is dropped, either partially or in its entirety, inter-frames that follow will likely not have a proper frame reference and therefore result in degraded video in the form of video artifacts or macro blocking when rendered. Because of this, dropping a packet within a frame will have a significant impact on how the entire group of pictures is rendered.
Furthermore, video packet flow control over a network must also take into account the fact that large bursts of packets may also impact performance to other media streams in the system. Real time video compression codecs operate on individual frames as they become available from the video source. The bit stream generated for a frame is typically transmitted over the network as fast as possible which results in a burst of network traffic for each frame. These video bit stream bursts occur at a rate that corresponds to the target frame rate of the video. Generally, even highly compressed video is considered a large consumer of bandwidth on a network.
Some of the negative implications of these network bursts are:
Larger buffers are required at various points in the network stack in order to absorb the bursts.
Some types of network links, like cellular 3G, do not do not handle bursts of traffic as efficiently as they would more constant traffic.
Data packets for other isochronous streams such as voice channels can get bunched up behind bursts of video packets resulting in increased jitter and possibly jitter buffer underflow conditions.
Processor utilization will tend to spike during the packet bursts as the client application and network stack process the bit stream data as fast as possible.
These bursts can starve other threads which will tend to stress software synchronization mechanisms and buffering schemes.