This invention relates to video compression, and more particularly to error recovery from errors in synchronization fields.
Video is an important part of a rich multimedia experience. Personal computers (PC""s) and various other computing devices have delivered video feeds to users over the Internet. However, processing of video bitstreams or feeds is among the most data-intensive of all common computing applications. Limited communication-line bandwidth has reduced the quality of Internet video, which is often delivered in small on-screen windows with jerky movement.
To mitigate the problems of large video streams, various video-compression techniques have been deployed. Compression standards, such as those developed by the motion-picture-experts group (MPEG), have been widely adopted. These compression techniques are lossy techniques, since some of the picture information is discarded to increase the compression ratio. However, compression ratios of 99% or more have been achieved with minimal noticeable picture degradation.
Portable hand-held devices such as personal-digital-assistants and cellular telephones are widely seen today. Wireless services allow these devices to access data networks and even view portions of web pages. Currently the limited bandwidth of these wireless networks limits the web viewing experience to mostly text-based portions of web pages. However, future wireless networks are being planned that should have much higher data transmission rates, allowing graphics and even video to be transmitted to portable computing and communication devices.
Although proponents of these next-generation wireless networks believe that bandwidths will be high enough for high-quality video streams, the inventor realizes that the actual data rates delivered by wireless networks can be significantly lower than theoretical maximum rates, and can vary with conditions and local interference. Due to its high data requirements, video is likely to be the most sensitive service to any reduced data rates. Interference can cause intermittent dropped data over the wireless networks.
Next-generation compression standards have been developed for transmitting video over such wireless networks. The MPEG-4 standard provides a more robust compression technique for transmission over wireless networks. Recovery can occur when parts of the MPEG-4 bitstream is corrupted.
FIG. 1 shows a MPEG-4 bitstream that is composed of video object planes and video packets. The video is sent as a series of picture frames known as video object planes (VOP). These picture frames are replaced at a fixed rate, such as every 30 milliseconds to give the illusion of picture movement. Rather than transmit every pixel on each line, the picture is divided into macroblocks and compressed by searching for similar macroblocks in earlier or later frames and then replacing the macroblock with a motion vector or data changes.
Video object planes VOP 10, 12 are two frames in a sequence of many frames that form a video stream. Pixel data in these planes are compressed using macroblock-compression techniques that are well-known and defined by the MPEG-4 standard. The compressed picture data is divided into several video packets (VP) for each video object plane VOP.
Each video object plane begins with a VOP start code, such as VOP start code 20 which begins VOP #1 (10), an VOP start code 21, which begins VOP #2 (12). First video object plane VOP 10 has VOP header 22 that follows VOP start code 20, and data field 24 which contains the beginning of the picture data for VOP 10.
After a predetermined amount of data, such as 100 to 1000 bits, a new video packet begins with resync marker 30 and VP header 32. Data field 34 continues with the picture data for VOP 10. Other video packets follow, each beginning with a resync marker and VP header, followed by a data field with more of the picture data for VOP 10. The last video packet VP #N in VOP 10 begins with resync marker 31 and VP header 33, and is followed by the final picture data for VOP 10, in data field 35.
The second video object plane VOP 12 begins with VOP start code 21 and VOP header 23, and is followed by data field 25, which has the first picture data for the second picture frame, VOP 12. Other video packets follow for VOP 12.
The VOP headers include a VOP coding type (I, P, or B), VOP time, rounding type, quantization scale, f_code, while the VP headers include a macroblock number for the first macroblock in the packet, quantization scale, VOP coding type and time. The headers can include other information as well.
The VOP start codes and VP resync markers contain unique bit patterns that do not occur in the headers or data fields. The video object plane VOP start code is:
0000 0000 0000 0000 0000 0001 1011 0110.
This code is defined by the MPEG-4 standard. The start code is 000001 B6 in Hexadecimal notation. The start code begins with a string of 23 zero bits. The picture data in the macroblock data fields are encoded so that they never have such a long string or run of zero bits. Likewise, the headers do not have such a long run of zero bits. Thus the start code is unique within the video bitstream, allowing a bitstream decoder to easily detect the start code.
FIG. 2 is a table of codes for the resync markers that marks the beginning of a new video packet. An f_code for each VOP is encoded into a 3-bit field in the VOP header. When the VOP specifies that the frame""s f_code is 1, then each video packet VP in the frame begins with a 17-bit resync marker:
0000 0000 0000 0000 1
which has an initial run of 16 zeros.
When f_code is set to 3, a 19-bit resync marker is used for all video packets in the frame, with an initial run of 18 zero bits. The f_code can be set to values from 1 to 7. For f_code=7, a 23-bit resync marker with an initial run of 22 zero bits is specified. Different values of f_code and the corresponding resync markers used are shown in the table, along with the lengths of the resync markers, which vary from 17 to 23 bits.
All resync markers begin with a long run of zero bits, from 16 to 22 bits of zero. Note that the VOP start code is 24 bits, longer than any possible resync marker. The VOP start code has a longer run of 23 zero bits, allowing the VOP start code to be distinguished from the VP resync markers.
The f_code also specifies the motion-vector search range used by the encoder and thus the number of bits that can be used to encode the motion vector. For f_code=1, the maximum search range is [xe2x88x9232,31] half-pixels, or about +/xe2x88x9216 pixels, with a half-pixel resolution. The search range doubles for each higher value of f_code, to [xe2x88x9264,63] for f_code=2, up to [xe2x88x922048,2047] for the maximum f_code=7. This provides a maximum search range of about 1K pixels.
Larger search ranges are desirable since the encoder is more likely to find a pixel match for a macroblock within a larger area of the image. Thus encoding efficiency improves when larger search ranges and larger f_codes are used.
A simplification of the MPEG-4 standard sets the f_code to 1 for all video packets. This simplification is known as simple profile level 0. In this case the resync markers are always:
0000 0000 0000 0000 1
which has an initial run of 16 zeros. While useful for simplifying the search for resync markers, limiting the f_code to 1 also limits the motion-vector search range and thus the coding efficiency.
FIG. 3 is a diagram of an MPEG-4 decoder. The MPEG-4 bitstream is parsed by parser 50, which searches for start-code and resync bit patterns. A bit-wise comparator can be used, comparing the last N bits received to a Q-bit pattern of the start code or resync marker. When the last N bits match the VOP start code, start-code and VOP header decoder 56 sends some of the VOP header information to bitstream decoder 52 and instructs it to decode the bits following the VOP header as the data field of the initial video packet. The f_code from the VOP header is sent back to bitstream parser 50. Parser 50 then searches for the next packet by searching for a resync marker with the number of bits indicated by the f_code, as shown in FIG. 2. The picture data from the data field is output as the video data for further processing of motion vectors and macroblocks (de-compression).
When the last N bits received by parser 50 match the resync bit pattern for the current f_code, resync marker and VP header decoder 58 decodes the following bits as the VP header and instructs bitstream decoder 52 to decode the bits following the header as the data field for the video packet. The picture data from the data field is output as the video data for further processing of motion vectors and macroblocks.
When the bit pattern is neither a start code, nor a resync marker or their headers, macroblock decoder 55 decodes the data fields into the macroblock descriptions, motion vectors, and discrete cosine transform (DCT) coefficients of the picture data.
Errors can be detected when an invalid motion vector or discrete cosine transform (DCT) code is found. However, there is no standard error-detection method. When an error is detected by bitstream decoder 52, parser 50 is instructed to search for the next VOP start-code or VP resync marker. Any data in the bitstream is ignored once the error occurs until the next start code or resync marker is found. When start-code decoder 56 finds a start code in the bitstream, decoding can continue with the next VOP header. The data following the VOP header is processed, but any video data after the error until the VOP header is discarded since the location of the macroblocks and motion vectors in the bitstream are uncertain due to loss of sync from the error. Backward decoding may be used to recover some of the lost video data when reversible variable-length coding is used.
When resync marker decoder 58 finds a resync marker for the current f_code in the bitstream, decoding can continue with the next VP header. The data following the VP header is processed, but any video data after the error until the VP header is discarded due to the loss sync caused by the error. If reversible variable-length coding is used, some of the lost video data may be recovered by backward decoding.
When an error occurs, the remaining data in the video packet is lost. However, data in the next video packet can be used since the bitstream is re-synced by detection of the unique bit pattern, either the start code or resync marker.
FIG. 4A shows recovery from bit errors in a video packet. When the bitstream is transmitted over a wireless network, some corruption of the data is possible. The f_code for VOP #1 is read from VOP #1 header 22. This f_code is used to set the bit-length of the resync marker bit-pattern to search for.
In this example, a data error occurs in data field 24 in the first video packet of VOP #1. The remaining data in data field 24 is discarded, but the decoder searches for and finds the next resync marker 30. VP header 32 following resync marker 30 is decoded, and data processing resumes with data field 34 in the second video packet. Thus the only data lost is some of the data in data field 24.
Start code 21 is detected for the second frame, VOP #2. Second VOP header 23 is decoded, and the new f_code for VOP #2 is read to set the bit-length of resync markers in VOP #2. Data processing continues with the data in data field 25.
Dividing data from each video object plane into several video packets reduces the amount of lost data when a bitstream error occurs. Data from just one video packet is lost for each error. Only a portion of a frame is lost, such as less than 1/Nth of a frame when the video object plane is divided into N video packets.
Unfortunately, some kinds of bit errors are more difficult to recover from. FIG. 4B shows a slow recovery from bit errors that corrupt the f_code in the VOP header. Although start code 20 for VOP #1 is detected, a bit error in VOP header 22 prevents the f_code from being read properly. Macroblock data in data field 24 may be readable, or the bit error may prevent detecting the location of data field 24.
A more serious problem is that the exact length of the next resync marker is not known, since the f_code was not read from VOP header 22. Without knowing the length of the resync marker pattern, resync marker 30 for the second video packet (VP) of VOP #1 cannot be decoded, and data in data field 34 is discarded since the exact start of this field is not known.
Further aggravating the problem is that none of the resync markers in the entire video object plane VOP #1 can be detected, since their bit-length is known only by the f_code that was corrupted in VOP header 22. Thus resync marker 31 and its data field 35 cannot be located. Indeed, all data in all subsequent video packets in the current VOP are lost. The format of resync markers 30, 32 depends on the f_code.
Although the f_code is included in some VP headers, the exact location of the f_code in the VP headers is unknown when the resync marker preceding the VP header cannot be located. Thus having the f_code in the VP headers is not helpful when the resync markers cannot be located when their length is unknown.
The decoder finally is able to match start code 21 for the next VOP #2. Its VOP #2 header 23 is read to get the f_code for VOP #2. This allows subsequent resync markers to be decoded in the second video packet in VOP #2.
Data does not resume decoding until data field 25. An entire frame or video object plane is lost because the f_code in VOP header 22 was corrupted. Indeed, detecting such an error is difficult, since the corrupted f_code may appear to be another valid f_code, causing the resync-marker decoder to search for the wrong resync marker pattern. Subsequent resync markers in the VOP are not found, causing loss of data. Since the format of the motion vectors also depends on the f_code. Thus decoding of motion vectors cannot occur until the f_code is known.
Thus all data from the first to Nth video packets in VOP #1 are lost. An entire video frame is thus lost when a bit error occurs in the f_code in the VOP header. To prevent such serious errors, some MPEG encoders use a fixed f_code of 1 for all VOP""s. However, this is undesirable since it limits the motion-vector range to a small area of only +/xe2x88x9216 pixels.
What is desired is a bitstream decoder that locates resync markers of video packets despite bit errors that occur in the VOP header. A robust sync detector is desired that can more quickly recover from bitstream errors. An MPEG-4 decoder that can recover from a corrupted bitstream within one video packet is desirable to minimize loss of picture data. An MPEG-4 decoder that can locate resync markers when the f_code in the VOP header is unreadable or corrupted is desired.