1. Field of the Invention
This invention relates generally to error correction methods and more particularly to an error correction method which is compatible with block-based coding standards, such as the H.263 standard of the ITU.
2. Description of the Related Art
Block-based video compression algorithms such as H.26x ITU family, JPEG or MPEG efficiently compress video sequences, such as that shown in FIG. 1, which contain motion sequences. The compressed video information, however, is more sensitive to transmission errors caused by channel impairments. In fact, the more compressed the signal, the more information is carried by an individual bit, which will result in increased distortion in the reconstructed signal if a bit is decoded incorrectly. Moreover, catastrophic events, such as loss of synchronization of the transmission link, occur if even one bit is incorrectly decoded.
Many applications, such as video conferencing over the internet or video transmission over wireless links, require low bit-rate coding. Sophisticated methods have been implemented to remove signal redundancies in both the spatial and temporal dimensions of the image data. But transmission channels used to transfer the image data tend to experience many errors. For example, one can expect a Bit Error Rate (BER) as high as 10.sup.-3 in wireless links. In the case of internet communication links, it is not unusual to lose about 10% of the transmitted data packets. Therefore the transmission of highly compressed signals over channels with high error rates is quite challenging. The fact that errors can occur in bursts, e.g. when the radio link is fading, adds another challenging dimension to this problem.
Another aspect of the problem is the fact that video transmission is usually achieved via a packet-based stream and not bitstream based communication. In such an environment, it is possible to lose an entire packet of video data. This can, for example, occur due to buffer overflow in the routers or switches in the intermediate nodes of the network. Usually, these packets contain a Cyclic Redundancy Code (CRC) or check-sum that is powerful enough to detect packets received with errors. It is generally still possible to parse and decode a packet with errors, however, the decoded information is not reliable and should be used with care.
In addition to the problems above, real-time applications have stringent delay constraints and packets arriving late at the destination should be considered lost packets. It is therefore necessary to consider issues surrounding packetization since organization and arrangement of the transmitted data can significantly affect the robustness of the video transmission system.
In response to these problems, several standardization bodies, such as ITU (H.223), ISO-MPEG (MPEG4) and IETF (e.g. drafts proposing RTP payload format for encapsulating real-time bit streams) have proposed transmission standards which address these issues. The goal of the standards is generally to be able to decode each data packet independently from other packets in the transmission stream so that the loss of one packet does not affect the ability to decode subsequent packets. In other words, each packet should be an independent and self-contained unit.
An important aspect of present transmission media is their time-varying characteristic. For example, congestion can occur in the internet which causes long transmission delays or consecutive packet losses. A simple remedy for this problem is to lower the transmission rate of the sources, hence permitting the network to clear the backlog of packets. The class of coders which modify their transmission rate based upon network conditions are known as rate-adaptive source coders. These coders, however, operate on the assumption that a back channel is available from the network to the transmitter which provides the source coder with the status information for the network. In some instances, such as where there are long transmission delays (e.g. satellite links or store and forward networks), it is not possible to operate this feedback channel on a timely basis and open-loop transmission methods must be used.
A. Packetization of the H.263 Data Stream
H.263 has emerged as the dominant video coding standard for applications requiring low bit-rate coding, replacing H.261 which was mainly intended for video conferencing applications.
FIG. 1A is a functional block diagram which illustrates a conventional H.263 encoder 40. An incoming video stream which contains image picture data passes through subtractor 49 to DCT 41 which transforms the input residual pixel data into DCT residual data that is input to quantizer 42. The incoming video stream is also input to inter/intra classifier 45 and quantizer adaptor 46. Inter/intra classifier 45 also receives a picture type signal PTYPE, which indicates the type of the incoming picture data, and determines the value of the INTER/INTRA signal based upon the incoming video stream and PTYPE. Quantizer adaptor 46 observes the incoming video stream and determines the quantization signal QUANT.
The incoming video stream, the output of quantizer 42, the PTYPE signal and the INTER/INTRA signal are input to motion compensation generator 47 which compares past pictures to the present picture data in order to generate MOTION VECTORS for temporal compression of the picture data in the video stream. The motion compensation generator 47 also generates a subtraction picture signal which is input to subtractor 49.
Variable length encoder and multiplexor (MUX) 43 receives the output of quantizer 42, the PTYPE signal, the INTER/INTRA signal and the motion vectors in order to generate the H.263 data packets for a picture which are then stored in buffer 44 for output as a coded video stream.
The INTER/INTRA signal is also called the MTYPE signal of the H.263 specification. There are five MTYPE values in H.263: INTRA, INTRA+Q, INTER, INTER+Q and INTER4V. INTRA is the MTYPE corresponding to when a macroblock is encoded in intra-picture mode and the QUANT parameter is unchanged. INTRA+Q corresponds to an intra-picture encoding mode where QUANT is modified by DQUANT. Similarly, INTER corresponds to inter-picture, predictive, encoding where the QUANT parameter is unchanged and INTER+Q represents inter-picture encoding where QUANT is modified by DQUANT. INTER4V indicates that the macroblock is encoded with four motion vectors. In H.263, if the picture type PTYPE is I for INTRA, for intra-picture encoding, then the MTYPE for the macroblocks must be either INTRA or INTRA+Q. However, if PTYPE is P for predictive or inter-picture encoding, no similar restriction exists in the specification and it is possible to have INTRA values for the MTYPE of macroblocks for a P-type picture.
FIG. 1B is a functional block diagram illustrating a conventional H.263 decoder 50. The coded video stream generated by the conventional H.263 encoder 40 of FIG. 1A is received and stored in buffer 51. Variable length decoder and DMUX 52 decodes the H.263 data packets in buffer 51 and extracts the QUANT signal, INTER/INTRA signal, PTYPE signal and MOTION VECTORS as well as the picture data and the GOB HEADER INFO.
Inverse quantizer 53 receives the QUANT and INTER/INTRA signals which control the inverse quantization of the picture data decoded by variable length decoder and DMUX 52. The inverse quantized data output by inverse quantizer 53 is input to inverse DCT 54 which inverse transforms the picture data for output to adder 57. Motion compensation predictor 55 receives the PTYPE signal and MOTION VECTORS from variable length decoder and DMUX 52 as well as the decoded video stream output from adder 57 in order to produce motion compensated picture data which is input to adder 57 to reconstruct the original picture which is output as a video stream.
In H.263, as in other video encoding techniques such as MPEG, a video sequence consists of a sequence of pictures, as shown in FIG. 2A. A picture is the primary coding unit of the video sequence and generally consists of three rectangular matrices representing luminance Y and two chrominance values Cb and Cr. The Y matrix has an even number of rows and columns. Each of the Cb and Cr matrices are one-half the size of the Y matrix in each of the horizontal and vertical directions.
Each picture is constructed from a series of macroblocks MB where each MB consists of four luminance blocks, a Cb block and a Cr block, as shown in FIG. 2D. Each block, as shown in FIG. 2E, is an 8.times.8 matrix of pixels of the picture.
Pictures are divided into slices, one of which is shown in FIG. 2B, which consist of one or more contiguous MBs. The order of MBs in a slice is from left-to-right and top-to-bottom. If a bitstream contains an error, the decoder will typically skip to the start of the next slice from the slice which contains the error. A greater number of slices in the bitstream for the picture generally allows for better error concealment, but uses bits from the transmission channel which could otherwise be used to improve image quality.
For H.261 and H.263, macroblocks are also organized into groups of blocks GOBs, as shown in FIG. 2C. Similar to H.261, each GOB in H.263 has its own header. But the position of the GOB header in H.263, unlike H.261, is not fixed and can be varied to contain one or more slices, each slice being one horizontal row of MBs.
Motion compensation is a technique by which temporal redundancy between sequential pictures can be eliminated in order to compress the pictures. Motion compensation is performed at the MB level. When a MB is compressed, the compressed file contains a motion vector MV which represents the spatial difference between a reference MB and the MB being coded. The compressed file also contains error terms which represent the differences in content between the reference MB or MBs and the MB being coded.
There are three types of picture frame encoding that are common in various encoding standards, such as MPEG, H.261 and H.263. A frame which has no motion compensation and, therefore, has only been compressed by removing spacial redundancies is called an Intra block I. A P block is a frame wherein forward prediction is used to code the frame with reference to a previous reference frame. A subsequent frame can also be used as a reference in which case backward prediction is used. And a B frame is a frame wherein bi-directional prediction has been used where both a previous reference picture and a subsequent reference picture are used for coding. FIG. 3 demonstrates an MPEG sequence wherein an I picture is used as the reference picture to encode a P picture and a B picture. The P picture is used as a reference to encode B pictures and subsequent P pictures.
In an H.263 picture sequence, the first frame is an I frame, i.e. it is encoded in an intra-frame mode. The other frames are P frames except that H.263 permits two frames to be encoded together as "one" PB frame. This is different from MPEG where there are explicit P and B frame types. However, the relative encoding of the P and B frame types is similar to that for MPEG. Also, the location of PB frames can be arbitrary and there is no specific fixed order required under the standard. For instance, an H.263 frame sequence could take the form of:
______________________________________ 1 2 3 4, 5 6 7 8, 9 10 I P P PB P P PB P ______________________________________
wherein frame 1 is an I frame, frames 2, 3, 6, 7 and 10 are P frames and the frames 4,5 and 8,9 are each encoded as a PB frame.
H.263 is a hybrid motion-compensated coder based on half pixel accuracy motion estimation wherein each motion vector is encoded differentially (i.e. only the difference between the current motion vector and its prediction is transmitted) using a prediction window of three motion vectors of the surrounding macro-blocks, as shown in FIG. 4. This is in contrast to H.261 where the motion estimation uses full pixel accuracy and the motion vector of the previous macro-block in the same row is used as the prediction for encoding the motion vector of the current MB.
In FIGS. 4-7, MV is the motion vector for the current MB, MV1 is the motion vector of the previous MB in the sequence, MV2 is the motion vector of the above MB, and MV3 is the motion vector of the above right motion vector. The dotted lines in FIGS. 4-7 indicate a picture or GOB border which affect the motion vectors used to encode the current motion vector MV.
Note that in H.263, the motion vectors of the first slice of each GOB are effectively encoded in a fashion similar to H.261 using the adjacent MB as the predictor reference. As a result, if each GOB is limited to containing only one slice of information; then no information from the previous slice is needed to decode this GOB. H.263 also allows optional extended motion vectors, overlapped motion estimation, four motion vectors per each MB and the addition of PB frames (wherein each PB frame consists of a prediction block P and a bidirectional interpolation prediction block B). Using these options increases the efficiency of the coder but also adds to its complexity. Studies have shown, however, that the greatest performance gain over H.261 is obtained using half-pixel instead of full-pixel motion estimation.
Another advantage to the assumption that each GOB contains one slice of a picture is that one packet can be used to packetize each GOB and, thus, it is not necessary to pad any additional information from any previous slices into a packet header. This solution is, in fact, proposed as one of the transmission modes (Mode A) for the RTP payload format of H.263 video stream, as described in "RTP Payload Format for H.263 Video Stream", Internet Engineering Task Force (IETF), Internet draft, June 1996. Modes B and C of the RTP payload format allow for fragmentation at the MB boundaries but require considerably more overhead which can be prohibitively high at low bit-rate levels. By comparison, Mode A has eight bytes of overhead per packet whereas Mode B and C have twelve and sixteen bytes, respectively, of which four bytes is the RTP overhead and the rest is the H.263 payload overhead.
Another advantage of Mode A is that it also provides an easy error recovery method since the picture and the GOB header can be easily identified at the beginning of each packet payload. The main disadvantage of Mode A is its inflexibility with respect to the network packet size--the bits generated for each GOB should be smaller than the packet size. This problem, however, can be overcome in most circumstances by using a proper transmission rate allocation mechanism. Another alternative is to permit the use of variable RTP packet sizes or use Mode B to transmit the second packet in a series. However, the basic assumption that each transmitted packet contains no more than one slice of a picture means that the effect of a lost packet is limited to the loss of a single picture slice.
B. Error Recovery Methods
Error recovery methods fall into two general categories. These categories are open-loop and closed-loop. In closed-loop methods, a back channel from the receiver to the transmitter is maintained whereby the receiver conveys the status of the transmitted information back to the transmitter. This information can be as simple as a single bit indicating whether a packet is correctly received or it can contain more detailed information. It is then up to the transmitter to process this information and take an appropriate action. An example of a closed-loop method is the Automatic Recovery request (ARQ) recovery scheme wherein lost packets are retransmitted. See M. Khansari, A Jalali, E. Dubois and P. Mermelstien, "Low Bit-rate Video Transmission Over Fading Channels for Wireless Microcellular Systems," IEEE Transaction on CAS for Video Technology, pp. 1-11, Feb. 1996.
Alternatively, instead of retransmitting the packet, the transmitter can try to contain the effect of packet losses by adapting the subsequent encoding of the source image. E. Steinbach, N. Farber and B. Girod propose a method in "Standard Compatible Extension of H.263 for Robust Video Transmission in Mobile Environment", to appear in IEEE Transactions on CAS for Video Technology, wherein the encoder, upon receiving a negative acknowledgement of a packet, will not encode subsequent blocks in the using any further reference to the lost segments of the packet thereby containing the error propagation due to the faulty packet.
Another proposed error resilient strategy is the use of a feedback channel in the ISO MPEG-4 standard. See the report "Description of Error Resilient Core Experiments", Ad-hoc group on core experiments on error resilience aspects in MPEG-4 video, ISO/IEC JTC1/SC29/WG11 N1587 MPEG97, Mar. 31, 1997. Under this strategy, the receiver sends information regarding the status of an entire picture frame over the feedback channel.
The fundamental parameter in all closed-loop systems is the round trip delay. When this delay is small, the information provided to the transmitter by the receiver is more current and is therefore of greater relevance. When the round trip delay is too long or there are practical constraints to maintaining a back channel, open-loop error recovery methods must be considered instead.
In open-loop methods, the recovery from channel errors is the responsibility of the receiver. In the case of Forward Error Correction (FEC), the transmitter adds redundant parity bits which can be used to recover, to some extent, the information lost due to the channel error. In general, the input to the channel encoder is a binary stream generated by the source encoder and the channel encoder does not distinguish among different segments of the input binary stream. Using Unequal Error Protection (UEP) the amount of the protection is based upon the importance of the information where more important information (e.g. motion vectors in hybrid coders) is protected using more resilient channel codes. In this case, the source encoder should generate multiple streams which may not be practical.
Another open-loop system is Error Concealment (EC) wherein the source decoder tries to reconstruct the lost data using the information available at the same frame or a previously decoded frame. One EC scheme, that is particularly popular for motion compensation based coders, is to replace the lost pixels with pixels from the same location in the previous picture frame. This relatively simple method is effective in most situations but not when there is a high amount of motion and activity in the scene. Also, for a hybrid coder where temporal prediction is used, the sequence reconstructed at the transmitter will not be exactly the same as the sequence reconstructed at the transmitter. This results in error propagation along the temporal dimension and is known as drift phenomenon. This error propagation tends to be persistent and the loss of even one GOB can affect many subsequent frames.
Accordingly, a need remains for an open-loop error correction method for a block-based coding method, such as MPEG, JPEG and H.263 video coding, which has low overhead and limits error propagation.