With the availability of high-performance personal computers and popularity of broadband Internet connections, the demand for Internet-based video applications such as video conferencing, video messaging, video-on-demand, etc. is rapidly increasing. To reduce transmission and storage costs, improved bit-rate compression/decompression (“codec”) systems are needed. Image, video, and audio signals are amenable to compression due to considerable statistical redundancy in the signals. Within a single image or a single video frame, there exists significant correlation among neighboring samples, giving rise to what is generally termed “spatial correlation”. Also, in moving images, such as full motion video, there is significant correlation among samples in different segments of time such as successive frames. This correlation is generally referred to as “temporal correlation”. There is a need for an improved, cost-effective system and method that uses both spatial and temporal correlation to remove the redundancy in the video to achieve high compression in transmission and to maintain good to excellent image quality, while adapting to change in the available bandwidth of the transmission channel and to the limitations of the receiving resources of the clients.
A known technique for taking advantage of the limited variation between frames of a motion video is known as motion-compensated image coding. In such coding, the current frame is predicted from the previously encoded frame using motion estimation and compensation, and only the difference between the actual current frame and the predicted current frame is coded. By coding only the difference, or residual, rather than the image frame itself, it is possible to improve image quality, for the residual tends to have lower amplitude than the image, and can thus be coded with greater accuracy. Motion estimation and compensation are discussed in Lim, J. S. Two-Dimensional Signal and Image Processing, Prentice Hall, pp. 497-507 (1990). However, motion estimation and compensation techniques have high computational cost, prohibiting software-only applications for most personal computers.
Further difficulties arise in the provision of a codec for an Internet streamer in that the bandwidth of the transmission channel is subject to change during transmission, and clients with varying receiver resources may join or leave the network as well during transmission. Internet streaming applications require video encoding technologies with features such as low delay, low complexity, scalable representation, and error resilience for effective video communications. The current standards and the state-of-the-art video coding technologies are proving to be insufficient to provide these features. Some of the developed standards (MPEG-1, MPEG-2) target non-interactive streaming applications. Although H.323 Recommendation targets interactive audiovisual conferencing over unreliable packet networks (such as the Internet), the applied H.26x video codecs do not support all the features demanded by Internet-based applications. Although new standards such as H.263+ and MPEG-4 started to address some of these issues (scalability, error resilience, etc.), the current state of these standards is far from being complete in order to support a wide range of video applications effectively.
Due to very heterogeneous networking and computing infrastructure, highly scalable video coding algorithms are required. A video codec should provide reasonable quality to low-performance personal computers connected via a dial-up modem or a wireless connection, and high quality to high-performance computers connected using T1. Thus the compression algorithm is expected to scale well in terms of both computational cost and bandwidth requirement.
Real Time Protocol (RTP) is most commonly used to carry time-sensitive multimedia traffic over the Internet. Since RTP is built on the unreliable user datagram protocol (UDP), the coding algorithm must be able to effectively handle packet losses. Furthermore, due to low-delay requirements of the interactive applications and multicast transmission requirements, the popular retransmission method widely deployed over the Internet cannot be used. Thus the video codec should provide high degree of resilience against network and transmission errors in order to minimize impact on visual quality.
Computational complexity of the encoding and decoding process must be low in order to provide reasonable frame rate and quality on low-performance computers (PDAs, hand-held computers, etc.) and high frame-rate and quality on average personal computers. As mentioned, the popularly applied motion estimation and motion compensation techniques have high computational cost prohibiting software-only applications for most personal computers.