Multipoint online video conferences have increased in popularity as an efficient way to conduct meetings over local area networks (LANs) or wide area networks (WANs), such as the Internet. These conferences include the exchange of audio, video and sometimes sharing drawings, documents, or other application data among multiple “attendees.”
In order to provide a satisfying conference experience to the users, the conference video images must be viewed by attendees as close as possible to real time. However, streaming multimedia over the Internet is not truly “real time” because such packet-switched technology has inherent data flow inconsistencies. Network traffic variations cause packets to flow in inconsistent intervals, and buffering is needed to smooth out a multimedia stream.
Except for audio and video, Internet transmissions are commonly conducted under “lossless,” verified-delivery protocols, such as Transmission Control Protocol/Internet Protocol (TCP/IP). Such a protocol ensures that each packet sent is actually received and sequentially reassembled by the intended destination. Unfortunately, although a verified-delivery protocol may enhance reliability of some types of Internet data, the nature of the resulting data flow is problematic for the delivery video conference data in a timely manner.
As is generally known, the sender under TCP/IP must receive a verification message that a sent batch of packets were actually received by the intended recipient within a predetermined time, otherwise lost packets must be retransmitted. TCP/IP further reassembles packets at the destination in the order originally sent, and accordingly, this reassembly is delayed until missing packets have been resent. The characteristic delay in a TCP/IP communication caused by the non-arrival and subsequent retransmission of packets is commonly referred to as a “hiccup.” These TCP/IP hiccups result in unacceptable time lags for interactive video conferences. When congestion clears after each hiccup, current system buffers release a long burst of packets containing video frames already several seconds old. The video seen by the conference attendee becomes behind, and remains behind, for the remainder of the conference.
In order to keep up with the conference, it is desirable to display frames in “real time,” wherein a delay between displayed frames is the same as the delay between when the frames were captured.
Studies have determined that, on average, a person can perceive a delay of about 150 ms or more. Unfortunately, delays exceeding that length are frequently unavoidable over the Internet. For example, an Internet transmission of a data packet between New York and Los Angeles typically takes about 200 ms in each direction. Even though such a lag is perceptible, a satisfactory video conference experience would still possible if delays were limited to these short transmission delays. However, a conference experience becomes significantly impaired when a verified-delivery protocol mandates verification activities that extend the delay. In a TCP/IP hiccup situation, for example, time is consumed by the initial transmission, the verification period, and the retransmission, as well as video processing time at the sending and receiving computers. It has been found that a hiccup in a coast-to-coast TCP/IP transmission results as an average total lag time of about 1.5 seconds between the time that a video frame is created and ultimately received. This causes a 1.5 second delay in the video displayed by the receiver for the duration of the conference. Furthermore, it has been found that, on average, about two percent of packets must be resent, and numerous hiccups over the course of a conference result in a significant cumulative delay in the video stream. Under such conditions, an attendee would view conference video that falls behind several seconds each minute. Such woefully late video would be of little use to a conference attendee whose ability to meaningfully participate may be diminished as a result.
Some conference transmissions have been carried out over the Internet using non-standard protocols which do not verify packet delivery or retransmit lost packets. User Datagram Protocol (UDP) is a generally known example of such a protocol. Unfortunately, firewalls are typically set up to block communications under such non-standard protocols, undesirably limiting the attendees who can access the conference. Most firewalls do, however, permit TCP/IP communications to pass. Therefore, a need exists for a conference system which minimizes video transmission delays over the Internet, yet which can be implemented with a verified-delivery protocol such as TCP/IP in order to maximize access to attendees whose network connection passes through a firewall.
A video data stream includes sequential image frames which are packetized for sending over a network. Each of these packets contains data associated with a video frame image. Most compression/decompression (codec) algorithms encode a video stream so that only some of the frames are sent in their entirety. For example, Advanced Streaming Format (ASF) and Windows Media Video (WMV) send periodic key frames, and a series of delta frames are sent between key frames at a higher frequency. Each of the key frames contains all data necessary to construct an entire frame image, but each delta frame is encrypted to contain data representing only changes in the frame image relative to the immediately preceding frame. The key frames and delta frames are then packetized for transmission over the network. Notably, if a frame is somehow dropped or lost, a subsequent delta frame would not correspond with the last-displayed delta frame or key frame, causing the subsequently displayed video image to distort or “bubble.” This type of distortion would cumulatively worsen with each additional delta frame until the next key frame is displayed.
Various solutions have been contemplated to permit a conference attendee to catch up from delayed, buffered-up video. For example, it is possible to play delayed segments of old video at a faster-than-normal rate, but the resulting fast video spurts are undesirably distracting. Accordingly, a need exists for a process which helps a conference attendee to catch up from network delays to the extent possible, while optimizing image quality received by each attendee.