With the exploding popularity of the public Internet in the past several years for transporting all types of data, there has been much recent interest in transmitting digitally encoded real-time audio and video over the Internet using the Universal Datagram Protocol (UDP). Because UDP is an unreliable protocol, network packet losses will likely occur and, as a result, will adversely affect the quality of the received audio and video. Recovery from packet losses may be performed solely by the receiver, or better quality can be achieved by involving both the sender and the receiver in the error recovery process. In networks that support prioritization, such as ATM, video quality can be improved in the presence of packet loss by using scalable video coding (see, e.g., R. Aravind, M. Civanlar, A. Reibman, "Packet Loss Resilience of MPEG-2 Scalable Video Coding Algorithms," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 6, No. 5, October 1996). There is currently, however, no widespread support for prioritization on the public Internet. Overviews of proposed methods for error recovery for streaming of audio and video over the Internet, which involve both the sender and the receiver are disclosed by C. Perkins and O. Hodson in "Options for Repair of Streaming Media," Internet Engineering Task Force Internet RFC 2354, June 1987, and G. Carle and E. Biersack in "Survey of Error Recovery Techniques for IP-Based Audio-Visual Multicast Applications," IEEE Network, November/December 1997. While the general methods described in these overviews may be applicable to IP transmission of both audio and video, most of the studies published where specific techniques have been implemented involve audio only. Because of its higher data rates, and propagation of errors through inter-frame coding, it is more difficult to maintain video quality than audio, and audio techniques, therefore, cannot be directly applied to video signals.
Many of the currently popular schemes for transmitting digital video over the Internet, such as Motion-JPEG and wavelet-based schemes, use intra-frame coding. Inter-frame coding techniques, such as those used in MPEG-1, MPEG-2, H.261, and H.263 standards, are generally more compression-efficient than intra-frame techniques. However, the inter-frame standards suffer more from Internet packet loss because errors in one frame may propagate for many frames. An MPEG video sequence includes intra-frame coded (I) frames, and inter-frame predicted coded (P), and bi-directional inter-frame coded (B) frames. I and P frames are used in the prediction of subsequent frames while B frames are not used in the prediction of subsequent frames. For example, consider an MPEG video sequence with I frames occurring every 15 frames. In MPEG coding, because of inter-frame prediction, all predictive P and B frames rely upon the previous I frame. Thus, if an error occurs while transmitting the I frame, the effect persists for 15 frames, or 500 ms, which is quite noticeable to a viewer. The received video quality can be improved both through error concealment techniques that are applied at the decoder, and by error resilience techniques that are applied at the sender.
Error resilience techniques using Forward Error/Erasure Correction (FEC) add redundant data to a media stream prior to transmission, so that packet losses can be repaired at the receiver without requiring contact with or re-transmissions from the sender. Forward Error/Erasure Correction techniques are well suited to multicast applications, because they avoid the use of re-transmissions. The same redundant data can be used to repair the loss of different packets at separate receivers in a multicast group. If re-transmission were used instead, multiple re-transmission requests would have to be sent. Forward Error/Erasure Correction techniques for multimedia generally fall into one of two categories, media-independent FEC and media-specific FEC (see, e.g., C. Perkins and O. Hodson, "Options for Repair of Streaming Media," Internet Engineering Task Force Internet RFC 2354, June 1998).
In media-independent FEC, well-known information theory techniques for protecting any type of data are used. In, "Media-independent Error Correction using RTP," Internet Engineering Task Force Internet Draft, May 1997 by D. Budge, R. McKenzie, W. Mills, and P. Long, several variations of exclusive-OR (XOR) operations are used to create parity packets from two or more data packets. More complex techniques such as Reed Solomon (RS) coding can also be used (see, e.g., G. Carle and E. Biersack, "Survey of Error Recovery Techniques for IP-Based Audio-Visual Multicast Applications," IEEE Network, November/December 1997). Reed-Solomon encoding is an example of a systematic forward error/erasure correction code. A systematic forward error/erasure correction code is one in which the information bytes are transmitted in the codeword without modification. Thus, in the absence of channel errors, no Reed-Solomon decoding is necessary to recover the information bytes. When an RS(n,k) codeword is constructed from byte data, h parity bytes are created from k information bytes, and all n=k+h bytes are transmitted. Such a Reed Solomon decoder can correct up to any h/2 byte errors, or any h byte erasures, where an erasure is defined as an error in a known position. When RS coding is applied to protect packetized data against packet loss, k information packets of length j bytes are coded using jRS codewords. For each RS codeword, k information bytes are taken from k different packets (one from each packet), and the h constructed parity bytes are placed into h separate parity packets, and all n=k+h packets are transmitted. Because the transmitted packets are numbered, and packets are assumed to be received perfectly or not at all, the receiver can determine which packets are missing, and thus a packet loss can be considered to be an erasure. Hence, if any h (or fewer) of the n transmitted packets are lost, the original k information packets can be recovered perfectly.
A key advantage of RS coding is its ability to protect against several consecutive errors, depending on the parameter choices. The overhead rate for RS coding is h/k, and it is most efficient for protection against burst errors for large values of k. For example, an RS(6,4) code and an RS(4,2) code both can protect against a burst length of 2 errors. But the RS(4,2) code has 100% overhead, while the RS(6,4) code has only 50% overhead. Reducing the overhead percentage by increasing the block length, however, leads to delay because large block lengths require buffering of large amounts of data prior to transmission.
In media-specific FEC coding unlike in media-independent FEC coding where the multimedia stream is just treated as data, knowledge of the specific type of multimedia stream to be transmitted is used. In "Simulation of FEC-Based Error Control for Packet Audio on the Internet," INFOCOM, March 1998, San Francisco, Calif. by M. Podolsky, C. Romer, and S. McCanne, and in "Reliable Audio for Use over the Internet," Proc. INET '95, Honolulu, Hi., pp. 171-178, June 1995, by V. Hardman, M. A. Sasse, M. Handley, and A. Watson. a redundant low-bit rate audio stream is transmitted along with the standard audio stream, but delayed by one packet. If a standard audio packet is lost, the receiver uses the low-bit rate version of that audio instead, received in the next packet. This method protects against single packet losses.
In the aforenoted article by Perkins and Hodson, a suggestion is made to combine media-specific and media-independent techniques by applying the media-independent FEC techniques to the most significant bytes of a coder's output, rather than applying FEC over the entire multimedia bitstream. No specific information about how this can be accomplished is given however. A method for adding resilient information to inter-frame coded video, such as MPEG video, in order to protect video quality against packet loss, but which has low overhead and low delay is desirable.