Transmitting digital video over the current Internet is difficult. There is a big gap between Internet bandwidth and video bit rate. The current Internet is a best effort, unreliable network with no Quality of Service (QoS) guarantees. These difficulties require that an effective Internet video coding and transmission scheme be low bit rate and robust. The conflicting requirements of low bit rate and robustness requires a delicate balance between them. Traditional coding methods are optimized for compression ratio and rely on transmission schemes to provide robustness. The current Internet environment cannot provide desired robustness without sacrificing low delay and other real time requirements. Traditional schemes dealing with packet loss and error recovery are not suitable for the Internet because they are designed for specific environments under specific assumptions.
The Internet and its most important application the World Wide Web(WWW) have experienced exponential growth and gained widespread recognition during the past few years. The Internet and the WWW show the promise of becoming a global platform for computing, communication and collaboration. One reason for the phenomenal success of the Internet and the WWW is the successful integration of textual and graphical data and transmission of these static data types. The value of real time media, like real time video and audio on the Internet and WWW has been widely recognized. See, for example, C. Adie, xe2x80x9cA survey of distributed multimedia research, standards and productsxe2x80x9d, ftp://ftp.ed.ac.uk/pub/mmsurvey/, January 1993 (Adi93); C. Adie, xe2x80x9cNetwork access to multimedia informationxe2x80x9d, ftp://ftp.ed.ac.uk/pub/mmsurvey/, February 1994 (Adi94); T. J. Berners-Lee, R. Cailliau, J. F. Groff, and B. Pollerman, xe2x80x9cWorld Wide Web: The Information Universexe2x80x9d, Electronic Networking: Research, Applications and Policy, 2(1):52-58, 1992 (BLCGP92); F. Kappe and N. Sherbakov, xe2x80x9cHyper-G: A Universal Hypermedia Systemxe2x80x9d, ftp://iicm.tu-graz.ac.at/pub/Hyper-G/doc/report333.txt.Z, March 1992 (KS92); Z. Chen, S. Tan, R. Campbell, and Y. Li, xe2x80x9cReal time video and audio in the World Wide Webxe2x80x9d, In Proc Fourth International World Wide Web Conference, 1995 (CTCL95); Vosaic LLC white paper, http://choices.cs.uiuc.edu/Papers/New/www5/www5.html, February 1996 (wp96a); VXtreme Inc white paper, xe2x80x9cEnabling Interactive Video Over the Internetxe2x80x9d, http://www.vxtreme.com/developers/wp960304.html, March 1996 (wp96b); and Progressive Networks Inc. RealVideo Technical White Paper, http://www.realaudio.com/products/realvideo/overview/index.html, 1997 (Inc97). Supporting dynamic real time media such as real time video and audio, on the Internet enables new applications like real time visual communication, entertainment and distance learning and training, while enhancing the capability of existing ones. Internet video delivery has shown great commercial potential and, therefore, has encouraged a substantial number of commercial developments, e.g., as described in Xing Technology Corporation, xe2x80x9cStreamWorksxe2x80x9d, http://www.xingtech.com/, 1996 (Cor96b); VDOnet Corporation, xe2x80x9cVDOLive Internet Video Servers and Playersxe2x80x9d, http://www.vdolive.com/, 1996 (Cor96a); InterVU Inc., xe2x80x9cInervu Video Delivery Productsxe2x80x9d, http://www.intervu.com/, 1996 (Inc96a); Vivo Inc., xe2x80x9cVivoActive Video Delivery Productsxe2x80x9d, http://www.vivo.com/, 1996 (Inc96c); and VXtreme Inc., xe2x80x9cVXtreme Video Delivery Productsxe2x80x9d, http://www.vxtreme.com/, 1996 (Inc96d). However, because of the shortened Internet software development cycles, these commercial developments tend to rush to product development, often skipping or shortening the research phase. Unlike browser and push products, which require less planning, research-intensive video products are not suited to the shortened Internet software development cycle. Interesting research issues arise from a number of aspects of Internet video coding and transmissions. Solutions to these research problems cannot be found in traditional video compression and network transmission literature where the problems are often addressed in different environments under very different assumptions. The present invention is directed to the problem of how to effectively encode and transmit video over the Internet.
Supporting digital video on the Internet and WWW is very difficult. Unlike textual and image data, networked digital video requires efficient compression, large storage space, and sufficient bandwidth. Some of these requirements cannot be met in the current Internet environment. As a result, Internet video applications have suffered from poor transmission and playback quality. Of the many difficulties facing these applications, the two most significant are: (1) the gap between bit rate and bandwidth, and (2) the unreliable nature of the Internet.
Bit Rate and Bandwidth Gap A large gap exists between the compressed video bit rate and Internet bandwidth. Even with sophisticated video compression, the bit rate of digital video is often too high for most Internet connections. For example, a compressed full frame rate (30 f/s) broadcast quality (720xc3x97480) video runs at a bit rate of 3-8 Mbps using MPEG compression. See, V. Bhaskaran and K. Konstantinides. Image and Video Compression Standards: Algorithms and Architectures, Kluwer Academic Publications, 1995 (BK95). A good Internet connection with a shared Ti line has a maximum bandwidth of 1.5 Mbps. Even with compromised video frame rate, quality and frame size, the bit rate is often high for average and low bandwidth connections. For example, a 10/320xc3x97240 video typically has a bit rate of 100 Kbps to 400 Kbps with MPEG compression. Currently a home user with dial-up or ISDN service can get a typical bit rate in the range from 14 Kbps to 128 Kbps.
Unreliable Nature of the Internet The Internet is inherently a packet switched, best effort, unreliable network. Research is being conducted toward a network with guaranteed quality of service (QoS). See, D. D. Clark, S. Shenker, and L. Zhang. xe2x80x9cSupporting real-time application in an integrated services packet network: Architecture and mechanismxe2x80x9d In Proc. of SIGCOMxe2x80x292, 1992 (CSZ92); H. T. Kung, T. Blackwell, and A. Chapman. xe2x80x9cCredit-based ow control for ATM networks: Credit update protocol, adaptive credit allocation, and statistical multiplexingxe2x80x9d, In Proc SIGCOMxe2x80x294, 1994 (KBC94); C. Partridge. Gigabit Networking, Addison-Wesley, 1993 (Par93); and L. Zhang, S. Deering, D. Estrin, and D. Zappala. xe2x80x9cRSVP: A New Resource ReSerVation Protocolxe2x80x9d, IEEE Network, September 1993 (ZDEZ93). However, there is no QoS guarantee on the current Internet. Packets on the Internet can get delayed, duplicated, or lost during the delivery process. Existing flow control and error handling schemes like those implemented in TCP (See, e.g., D. Comer and D. Stevens. Internetworking with TCP/IP Volume 1 Principles. Protocols. and Architecture, Prentice Hall, Englewood Cliffs, N.J., 1991 (CS91); and [Jac88] V. Jacobson. Congestion Avoidance and Control. In Proc. ACM Sigcomxe2x80x288, pages 314-329, Stanford, Calif., August 1988 (Jac88)) ensure 100% reliability; however, they do not consider timely delivery. Therefore, they are only suitable for reliable non-realtime text and image transmission. As discussed below, these flow control and error handling schemes cause an unnecessarily large delay for delivery and are not suitable for video and other media. Internet video transmission has to deal with delay, jitter and packet losses. While delay and jitter can be effectively dealt with for on-demand services (CTCL95), packet loss is the major source of problems for Internet video transmission and playback. Dealing with packet losses requires both robust video coding and efficient transmission.
Internet video transmission must overcome the difficulties caused by the bit rate bandwidth gap and the unreliable nature of the Internet. The bit rate and Internet bandwidth gap requires efficient video compression schemes with very low bit rate. The unreliable nature of the Internet demands video coding and transmission schemes be robust enough to tolerate packet loss. The problem becomes more complicated because these two requirements often conflict with each other. An efficient compression algorithm that produces video streams with very low bit rate often renders the bitstream vulnerable to bit error and packet losses. A robust scheme that is resilient to error and packet losses often results in a high bit rate.
The reason behind this conflict is the way compression works. Video compression algorithms achieve compression by exploiting the similarities in the uncompressed video stream and removing redundancy. Similarities in video take two forms, spatial redundancy and temporal redundancy. Within one video frame, neighboring pixels tend to be similar in intensity and color values. Across video frames, frames tend to be similar because of the slow, continuous movement and change in the video sequence. As discussed later herein, spatial redundancy is often removed by transformation and variable length coding. When variable length encoding introduces state information in the bitstream, a bit error can cause the decoder to lose synchronization with the correct decoding state and consequently the decoding process may collapse. Temporal redundancy is removed by predictive coding, which codes only the difference between the current frame and its reference frame. When a frame is to be coded, it uses a coded frame from the past and/or the future as the reference frame. Only the difference between the frame and its reference frame is coded. Difference coding is a major factor for achieving compression. For example, for a sequence of 10 frames of H.263 encoded video, the display order of frames is from left to right and from top to bottom. The first frame is coded as an independent frame that does not use any reference frame. Each of the subsequent 9 frames uses its immediately previous frame as a reference frame. The size for the independently coded frame may be, e.g., 1236 bytes; the average size for the difference coded 9 frames may be 258 bytes. Using difference coding can achieve a size reduction of 70% for this sequence.
Although difference coding is essential in achieving efficient compression, it also introduces dependencies between frames, since a difference coded frame needs its reference frame for correct decoding. Loss of the reference frame will cause damage to the decoding of the difference encoded frame. Sometimes, since the reference frame is also difference coded, the dependencies among the frames form a chain, propagating damage. E.g., if the sixth frame in the ten frame sequence discussed above is lost in transmission, the decoder uses frame 5 as a replacement, damaging the decoding of frame 6. Since frames 7-10 all depend on their immediate predecessors, the damage caused by the loss of frame 6 propagates to all these frames.
Low bit rate video coding relies on efficient compression, which is achieved by introducing dependencies between different parts of the encoded bitstream. When one part of the stream is lost, the parts which depend on it are damaged. Sometimes the damage can be propagated. Assuming an accurate similarity measurement and assessment, typically the more dependency a compression scheme introduces into the bitstream, the more efficient the compression scheme is and the lower bit rate it can generate. However, the more dependencies in the bitstream, the more damage results when part of the stream is lost. In other words, the aforementioned efficient compression is less robust because it is susceptible to packet loss. Thus, there is a conflict between low bit rate and robustness. For effective Internet video delivery, meeting either one of these requirements does not necessarily improve the overall performance.
The existing research on Internet video transmission is divided into two camps with two different approaches to addressing the conflict between coding efficiency and coding robustness. The first one, best exemplified by the Mbone (e.g, see M. Macedonia and D. Brutzman, xe2x80x9cMbone, the multicast backbonexe2x80x9d, IEEE Computer, 27(4):30-36, April 1994 (MB94)) video conferencing tools like NV and VIC (See, e.g., Ron Frederick, xe2x80x9cExperiences with software real time video compressionxe2x80x9d, Technical report, Xerox Palo Alto Research Center, July 1992, available on the WWW via ftp://parcftp.xerox.com/pub/net-research/nv-paper.ps (Ron92); INRIA-RODEO, Inria videoconferencing system, http://www.inria.fr/rodeo/ivs.html (IR); and S. McCanne and V. Jacobson, xe2x80x9cvic: A Flexible Framework for Packet Videoxe2x80x9d, In ACM Multimediaxe2x80x295, pp. 511-522, November 1995 (MJ95)), stresses coding and transmission robustness and focuses less on the coding efficiency. The coding schemes, which are normally robust to packet loss, use no or primitive difference coding and introduce little dependency into the bitstream. As a result, the coding scheme is robust but the coding efficiency is poor. Often the resulting bit rate is too high for low bit rate connections. Another approach is taken by on-demand Internet video transmission. See, e.g., Brian Smith, xe2x80x9cImplementation Techniques for Continuous Media System and Applicationsxe2x80x9d, PhD thesis, University of Calif., Berkeley, 1993 (Smi93); Shanwei Cen, Calton Pu, Richard Staehli, Crispin Cowan, and Jonathan Walpole, xe2x80x9cDemonstrating the Effect of Software Feedback on a Distributed Real-Time MPEG Videoxe2x80x9d, In ACM 1995 Multimedia Conference, San Francisco, Calif., November 1995 (CPS+95); and CTCL95, tends to use existing standard efficient compression schemes like MPEG and tailors the transmission scheme for loss handling. Since most traditional video coding schemes use only coding efficiency as an optimization criteria, a pervasive dependency structure is introduced into the bitstream making the bitstream extremely vulnerable to transmission errors. As a result, packet loss handling in these schemes is difficult.
Existing Internet based video transmission systems tend to go to extremes when dealing with the tension between compression efficiency and error handling. There are traditional video coding and error handling schemes like Forward Error Correction. N. Ohta, xe2x80x9cPacket Video Modeling and Signal Processingxe2x80x9d, Artech House, 1994 (Oht94); and E. Ayanoglu, P. Pancha, A. Reibman, and S. Talwar, xe2x80x9cForward Error Control for MPEG-2 Video Transport in a Wireless ATM LANxe2x80x9d, In Proc. of ICIPxe2x80x296, Lausanne, Switzerland, 1996 (APRT96). These error handling schemes often have different assumptions and are designed for environments that are very different from the Internet. For example, Forward Error Correction(FEC) has been studied extensively and has been used in wireless and ATM environments to deal with bit error. However, in the Internet environment, bit error due to bit corruption during transmission is insignificant compared to packet loss. See, e.g., CS91 and Par93 above, and W. Stevens, UNIX Network programming, Prentice Hall, 1990 (Ste90). Though a bit error can cause loss of synchronization between the encoder and decoder because variable length coding is used, usually a bit error is corrected in the IP layer and therefore encapsulated from the applications.
Another popular approach to dealing with error and loss is layered coding. R. Aravind, M. Civanlar, and A. Reibman, xe2x80x9cPacket Loss Resilience of MPEG-2 Scalable Video Coding Algorithmsxe2x80x9d, IEEE Trans. on Circuits and Systems for Video Technology, 6(5), October 1996 (ACR96); E. Amir, S. McCanne, and M. Vetterli, xe2x80x9cA. Layered DCT Coder for Internet Videoxe2x80x9d, In Proc. of ICIPxe2x80x296, Lausanne, Switzerland, 1996 (AMV96); and Oht94 cited above. Video data is partitioned into important data, like a lower frequency band, and unimportant data like a higher frequency enhancement band. Different partitions are coded into different layers so that important layers can be sent with a channel that has a better transmission behavior like low delay and low loss rate. The enhancing layer is sent through a channel that has fewer quality of service guarantees. This approach is very suitable for networks where packets can be assigned different priorities and ensured different quality of service. In the Internet environment, however, quality of service guarantees do not exist and no distinctions are made between packet types. Improving the coding efficiency has been the focus for most traditional video compression research. See BK95 cited above; J. Mitchell, W. Pennebaker, C. Fogg, and D. LeGall, MPEG Video Compression Standard, Chapman and Hall, New York, N.Y., 1997 (MPFL97); G. Wallace, xe2x80x9cThe JPEG Still Picture Compression Standardxe2x80x9d, Communications of the ACM, 34(4):30-44, April 1991 (G. 91); The International Telecommunication Unino, ITU-T Recommendation H.261: Video Codec for Audiovisual Services at px64 kbit/s, 1990 (Uni90); and The International Telecommunication Union, Draft ITU-T Recommendation H.263, July 1995 (Uni95). Coding robustness has been less of an issue. For example, the use of the I frame in the MPEG coding scheme provides a resynchronization point and increases robustness; however, the original intention of the I frame is to provide a random access point rather than error resilience. The recent H.263 (Uni95) and MPEG 4 (See, e.g., L. Chiariglione, xe2x80x9cMpeg-4: Coding of audio-visual objectsxe2x80x9d, http://drogo.cselt.stet.it/mpeg/mpeg 4.htm, July 1996 (Chi96)) efforts have been particularly focused on low bit rate video coding but error handling has not be a major concern. As a result, transmitting H.263 based video over the unreliable Internet is difficult.
A delicate balance is required between the conflicting requirements of low bit rate and robustness of Internet video transmission. Traditional coding methods are optimized for compression ratio and rely on transmission schemes to provide robustness. The current Internet environment cannot provide desired robustness without sacrificing low delay and other real time requirements.
The present invention provides a practical solution to the above problems, based on a comprehensive Internet traffic behavior study and with the results used as guidelines for the design and implementation of low bit rate and robust video coding and transmission schemes. The invention addresses problems in both coding and transmission. For coding, a hybrid coding scheme is proposed to increase the robustness. For transmission, an effective dependency isolation algorithm is designed to minimize the propagation of packet loss damage. A low bit rate and robust Internet video coding and transmission scheme can thus be realized by properly balancing the bit rate and robustness through improved I frame coding and efficient frame packetization.
The invention is based on three major components: a comprehensive Internet video traffic experiment to study packet loss and delay behavior, a hybrid wavelet H.263 coding scheme that improves robustness while keeping the bit rate low, and an efficient packetization scheme which minimizes packet loss damage.
The video traffic experiment is used to study the unreliable nature of the Internet. Delay and loss behavior are studied and analysis of their impact are used to guide the design and implementation of the coding and transmission schemes.
For a robust low bit rate coding, we propose a hybrid wavelet/H.263 coding. Coding standards like H.263 dramatically improve the predictive coding but coding of the I frame remains the same. A robust coding requires that more I frames be inserted in the bitstream for loss damage prevention and recovery. However, the large I frame size makes this impractical. Wavelet is ideal for still image coding because of its nice space locality feature; however, extensive use of wavelet is difficult because of its complexity and inability to do inter-frame difference coding. In this thesis we describe a hybrid coding scheme with wavelet I frame coding and H.263 predictive coding to produce a robust, low bit rate video coding scheme.
Predictive coding is essential to the coding efficiency. The Internet low bit rate requirement prohibits the use of such robust, high bit rate coding schemes as in Mbone tools like NV and IVS (Ron92, and IR, cited above). However, predictive coding introduces dependency into the bitstream, which propagates packet loss damage. In the present application we propose a novel dependency isolation packetization method that effectively minimizes the loss damage and its propagation. The new packetization method analyzes macroblock level dependency structure and packetizes the bitstream so as to minimize dependencies between packets. Through experiments and analysis, we show that by using the hybrid wavelet and H.263 coding scheme and the dependency isolation packetization method, a large range of packet loss rates can be tolerated.