Most packet based communication networks, especially Internet Protocol (IP) networks without guaranteed quality of service, suffer from a variable amount of packet losses or errors. Those losses can stem from many sources, for example router or transmission segment overload or bit errors in packets that lead to their deletion. It should be understood that packet losses are a common operation point in most packet networks architectures, and not a network failure.
Media transmission, especially the transmission of compressed video, suffers greatly from packet losses.
Means to improve the quality of a packet based channel have been reported early. Re-transmission based mechanisms can ensure error free channels at the cost of undefined delay. This, however, is not desirable for streaming and conversational applications.
Annoying artifacts in a media presentation resulting from errors in a media transmission can further be avoided by many different means during the media coding process. However, adding redundancy bits during a media coding process is not possible for pre-coded content, and is normally less efficient than optimal protection mechanisms in the channel coding using a forward error correction (FEC).
Forward Error Correction works by calculating a number of redundant bits over the to-be-protected bits in the various to-be-protected media packets, add those bits to FEC packets, and transmit both, the media packets and the FEC packets. At the receiver, the FEC packets can be used to check the integrity of the media packets and to reconstruct media packets that may be missing. Henceforth, the media packets and the FEC packets which are protecting those media packets will be called a FEC frame.
Most FEC schemes intended for error protection allow selecting the number of to-be-protected media packets and the number of FEC packets to be chosen adaptively to select the strength of the protection and the delay constraints of the FEC subsystem. Generally speaking, the more bits are spent to achieve a required protection strength, the lower a delay can result. Variable FEC frame sizes are dealt with for example in the specification RFC 2733: “An RTP Payload Format for Generic Forward Error Correction”, December 1999, or in the U.S. Pat. No. 6,678,855.
Packet based FEC in the sense discussed above requires a synchronization of the receiver to the FEC frame structure, in order to take advantage of the FEC. That is, a receiver has to buffer all media and FEC packets of a FEC frame before error correction can commence.
Video coding schemes, and increasingly some audio coding schemes, for example, use so-called predictive coding techniques. Such techniques predict the content of a later video picture or audio frame from previous pictures or audio frames, respectively. In the following, video pictures and audio frames will both be referred to as “pictures”, in order to distinguish them from FEC frames. By using predictive coding techniques, the compression scheme can be very efficient, but becomes also increasingly vulnerable to errors the longer the prediction chain becomes. Hence, so-called key pictures, or the equivalent of non-predictively coded audio frames—both referred to as key pictures henceforth, are inserted from time to time, that re-establish the integrity of the prediction chain by using only non-predictive coding techniques. Some alternative mechanisms avoid the use of key pictures in favor of guaranteed refresh times using only predictively coded pictures. In the scope of this document, this so-called gradual decoder refresh mechanism is essentially the same as a key picture. It is not uncommon that a key pictures is 5 to 20 times bigger than a predictively coded picture. Each encoded picture may correspond for example to one to-be-protected media packet.
Following the conventions of MPEG-2 visual, the picture sequence starting with a key picture and followed by zero or more non-key pictures is henceforth called Group of Pictures (GOP). In digital TV, a GOP consists normally of no more than six pictures. The key reason for such short GOPs is primarily the delay constraints when switching TV channels: since meaningful decoding can only start at a key picture, a decoder, on average, has to wait half of the transmission time of the bits of a GOP before it can start decoding a meaningful key picture. In streaming applications, GOP sizes are often chosen much bigger—some one hundred pictures in a GOP are not unusual—in order to take advantage of the better coding efficiency of predictively coded pictures. Hence the “tune in” to such a sequence can take several seconds.
FEC schemes can be designed to be more efficient when FEC frames are big in size, for example when they comprise some hundred packets. Similarly, most media coding schemes gain efficiency when choosing larger GOP sizes, since a GOP contains only one single key picture which is, statistically, much larger than the other pictures of the GOP.
However, both large FEC frames and large GOP sizes require to synchronize to their respective structures. For FEC frames this implies buffering of the whole FEC frame as received, and correcting any correctable errors. For media GOPs this implies the parsing and discarding of those media packets that do not form the start of a GOP (the key frame).
This not only requires a significant amount of buffering memory, but also buffering time. In conventional systems, where FEC decoding and media decoding is implemented independently, the average delay at tune-in is 1.5 dFEC+0.5 dMedia, where dFEC is the buffering delay of the FEC frame (in isochronous networks this is proportional to the size of the FEC frame), and dMedia is the buffering delay of the media GOP. The worst case buffer sizes have to be chosen such that a complete FEC frame and a complete GOP, respectively, fits into the buffer of an FEC decoder and the buffer of a media decoder, respectively.