This invention relates to the transmission and reception of coded video signals over the Internet, and more specifically, to the transmission and reception of video signals that have been coded using compression efficient inter-frame coding techniques such as those used in MPEG, MPEG-2, H.261, and H.263 standards.
With the exploding popularity of the public Internet in the past several years for transporting all types of data, there has been much recent interest in transmitting digitally encoded real-time audio and video over the Internet using the Universal Datagram Protocol (UDP). Because UDP is an unreliable protocol, network packet losses will likely occur and, as a result, will adversely affect the quality of the received audio and video. Recovery from packet losses may be performed solely by the receiver, or better quality can be achieved by involving both the sender and the receiver in the error recovery process. In networks that support prioritization, such as ATM, video quality can be improved in the presence of packet loss by using scalable video coding (see, e.g., R. Aravind, M. Civanlar, A. Reibman, xe2x80x9cPacket Loss Resilience of MPEG-2 Scalable Video Coding Algorithms,xe2x80x9d IEEE Transactions on Circuits and Systems for Video Technology, Vol. 6, No. 5, October 1996). There is currently, however, no widespread support for prioritization on the public Internet. Overviews of proposed methods for error recovery for streaming of audio and video over the Internet, which involve both the sender and the receiver are disclosed by C. Perkins and O. Hodson in xe2x80x9cOptions for Repair of Streaming Media,xe2x80x9d Internet Engineering Task Force Intemet RFC 2354, June 1987,and G. Carle and E. Biersack in xe2x80x9cSurvey of Error Recovery Techniques for IP-Based Audio-Visual Multicast Applications,xe2x80x9d IEEE Network, November/December 1997. While the general methods described in these overviews may be applicable to IP transmission of both audio and video, most of the studies published where specific techniques have been implemented involve audio only. Because of its higher data rates, and propagation of errors through inter-frame coding, it is more difficult to maintain video quality than audio, and audio techniques, therefore, cannot be directly applied to video signals.
Many of the currently popular schemes for transmitting digital video over the Internet, such as Motion-JPEG and wavelet-based schemes, use intra-frame coding. Inter-frame coding techniques, such as those used in MPEG-1, MPEG-2, H.261, and H.263 standards, are generally more compression-efficient than intra-frame techniques. However, the inter-frame standards suffer more from Internet packet loss because errors in one frame may propagate for many frames. An MPEG video sequence includes intra-frame coded (I) frames, and inter-frame predicted coded (P), and bi-directional inter-frame coded (B) frames. I and P frames are used in the prediction of subsequent frames while B frames are not used in the prediction of subsequent frames. For example, consider an MPEG video sequence with I frames occurring every 15 frames. In MPEG coding, because of inter-frame prediction, all predictive P and B frames rely upon the previous I frame. Thus, if an error occurs while transmitting the I frame, the effect persists for 15 frames, or 500 ms, which is quite noticeable to a viewer. The received video quality can be improved both through error concealment techniques that are applied at the decoder, and by error resilience techniques that are applied at the sender.
Error resilience techniques using Forward Error/Erasure Correction (FEC) add redundant data to a media stream prior to transmission, so that packet losses can be repaired at the receiver without requiring contact with or re-transmissions from the sender. Forward Error/Erasure Correction techniques are well suited to multicast applications, because they avoid the use of re-transmissions. The same redundant data can be used to repair the loss of different packets at separate receivers in a multicast group. If re-transmission were used instead, multiple retransmission requests would have to be sent. Forward Error/Erasure Correction techniques for multimedia generally fall into one of two categories, media-independent FEC and media-specific FEC (see, e.g., C. Perkins and O. Hodson, xe2x80x9cOptions for Repair of Streaming Media,xe2x80x9d Internet Engineering Task Force Intemet RFC 2354, June 1998).
In media-independent FEC, well-known information theory techniques for protecting any type of data are used. In, xe2x80x9cMedia-independent Error Correction using RTP,xe2x80x9d Internet Engineering Task Force Internet Draft, May 1997 by D. Budge, R. McKenzie, W. Mills, and P. Long, several variations of exclusive-OR (XOR) operations are used to create parity packets from two or more data packets. More complex techniques such as Reed Solomon (RS) coding can also be used (see, e.g., G. Carle and E. Biersack, xe2x80x9cSurvey of Error Recovery Techniques for IP-Based Audio-Visual Multicast Applications,xe2x80x9d IEEE Network, November/December 1997). Reed-Solomon encoding is an example of a systematic forward error/erasure correction code. A systematic forward error/erasure correction code is one in which the information bytes are transmitted in the codeword without modification. Thus, in the absence of channel errors, no Reed-Solomon decoding is necessary to recover the information bytes. When an RS(n,k) codeword is constructed from byte data, h parity bytes are created from k information bytes, and all n=k+h bytes are transmitted. Such a Reed Solomon decoder can correct up to any h/2 byte errors, or any h byte erasures, where an erasure is defined as an error in a known position. When RS coding is applied to protect packetized data against packet loss, k information packets of length j bytes are coded using jRS codewords. For each RS codeword, k information bytes are taken from k different packets (one from each packet), and the h constructed parity bytes are placed into h separate parity packets, and all n=k+h packets are transmitted. Because the transmitted packets are numbered, and packets are assumed to be received perfectly or not at all, the receiver can determine which packets are missing, and thus a packet loss can be considered to be an erasure. Hence, if any h (or fewer) of the n transmitted packets are lost, the original k information packets can be recovered perfectly.
A key advantage of RS coding is its ability to protect against several consecutive errors, depending on the parameter choices. The overhead rate for RS coding is h/k, and it is most efficient for protection against burst errors for large values of k. For example, an RS(6,4) code and an RS(4,2) code both can protect against a burst length of 2 errors. But the RS(4,2) code has 100% overhead, while the RS(6,4) code has only 50% overhead. Reducing the overhead percentage by increasing the block length, however, leads to delay because large block lengths require buffering of large amounts of data prior to transmission.
In media-specific FEC coding unlike in media-independent FEC coding where the multimedia stream is just treated as data, knowledge of the specific type of multimedia stream to be transmitted is used. In xe2x80x9cSimulation of FEC-Based Error Control for Packet Audio on the Internet,xe2x80x9d INFOCOM, March 1998, San Francisco, Calif. by M. Podolsky, C. Romer, and S. McCanne, and in xe2x80x9cReliable Audio for Use over the Internet,xe2x80x9d Proc. INET ""95, Honolulu, Hl, pp. 171-178, June 1995, by V. Hardman, M.A. Sasse, M. Handley, and A. Watson. a redundant low-bit rate audio stream is transmitted along with the standard audio stream, but delayed by one packet. If a standard audio packet is lost, the receiver uses the low-bit rate version of that audio instead, received in the next packet. This method protects against single packet losses.
In the aforenoted article by Perkins and Hodson, a suggestion is made to combine media-specific and media-independent techniques by applying the media-independent FEC techniques to the most significant bytes of a coder""s output, rather than applying FEC over the entire multimedia bitstream. No specific information about how this can be accomplished is given however. A method for adding resilient information to inter-frame coded video, such as MPEG video, in order to protect video quality against packet loss, but which has low overhead and low delay is desirable.
In accordance with the present invention, an inter-frame coded video signal, such as an MPEG video signal, employs a data splitting function to split such a video stream into a high priority and a low priority partition. Systematic Forward Error/Erasure Correction coding is then performed on only the data in the high priority partition. The Forward Error/Erasure Corrected high priority partition data and the non-Forward Error/Erasure Corrected low priority partition data are then combined into packets and transmitted over the same network to a receiver, where they are decoded. Depending on the degree of protection against errors or erasures offered by the particular FEC code that is used, the loss of one or more packets containing high priority data can be corrected with no loss of data in the high priority partition. The effect of the loss of the low partition data in the lost packet or packets, which low partition is not protected, has much less of a deleterious effect on the quality of the decoded video signal than would the loss of data from the high priority partition data. Advantageously, by limiting the application of the Forward Error/Erasure Correction to only the higher priority partition data, and thus protecting against loss only that xe2x80x9cmore important dataxe2x80x9d, the overhead requirement is reduced for protection against a given packet loss.
In the preferred embodiment, a Reed Solomon encoder is applied to the high priority data for an entire frame. For each RS(n,k) codeword, one information byte is taken from each of k packets and the constructed parity bytes are placed in h different packets, where n=k+h. Each individual frame""s data is arranged in the n equal length packets that contain a combination of: packet headers; high priority data comprising one of either information bytes or parity bytes; and low priority data bytes, the latter comprising only information bytes since no error-correction coding is performed on the low priority data. The same number of bytes of high priority data (information or parity in any one packet) are placed in each of the n equal length packets, and the same number of bytes of low priority data (information only) are placed in these same n packets, which together represent the video frame. Amongst these n equal length packets, k packets only contain high priority partition information bytes and h packets only contain the high priority parity bytes. The parity byte in each high priority byte position in each of theses h packets is formed from the RS(n,k) code as it is applied to the k high priority partition information bytes in a corresponding byte position in the k other high priority partition information-containing packets associated with the frame. Advantageously, arranging the packets in this manner minimizes the amount of overhead and delay for a given packet loss protection rate.
A receiving decoder, upon receiving the packets associated with a frame separates the high priority partition bytes and low priority partition bytes in each packet according to the numbers of such bytes or each type included within each packets, which numbers are transmitted in the packet headers. RS(n,k) decoding is applied byte position-by-byte position across the high priority partition portion within the received packets. If up to h of the n frame packets are lost, the RS decoding process recovers each high priority byte in the lost packet or packets. Full reconstruction of the high priority partition information bytes that were transmitted in the k packets of the n packets that contained high priority partition data is thus effected. Although the low priority partition data in the lost packets is unrecoverable, the fully recovered high priority partition data enables the video picture to be decoded, albeit in what might be at a reduced quality level for that frame or that portion of a frame in which only the high priority partition information is available.