The present invention relates to an apparatus for decoding pictures received with temporal references.
The appearance of videophones, videoconferencing systems, video-on-demand systems, and other systems that transmit moving pictures has led to the international standardization of methods of coding such pictures. The standards set forth in recommendations H.261 and H.263 of the Telecommunication Standardization Sector of the m International Telecommunication Union (ITU-T) are well known, as are the MPEG-1, MPEG-2, and MPEG-4 standards of the Moving Picture Experts Group, which have been adopted by the International Standards Organization (ISO).
Even when moving pictures are generated at standard frame rates, such as the rate of substantially thirty frames per second designated by the National Television System Committee (NTSC), they may be transmitted at slower frame rates. A slow frame rate may be necessary because of limited transmission bandwidth, or because coding the pictures takes time. The frame rate may also vary due to the use of different coding modes, or the occurrence of different amounts of motion in the picture. A common practice is therefore to include a temporal reference in each coded frame.
In the H.263 standard, for example, the temporal reference is an eight-bit binary number, representing the least significant eight bits of the absolute frame number. The value of the temporal reference ranges from zero to two hundred fifty-five (0-255). Under certain conditions, a ten-bit temporal reference may be used, but eight bits will be assumed in the description below.
If the transmitted frame rate is ten frames per second, or one-third the NTSC frame rate, for example, then the temporal reference normally increases in increments of three (0, 3, 6, 9, . . . ), wrapping around from two hundred fifty-five to zero ( . . . , 252, 255, 2, 5, . . . ). The temporal reference tells the picture decoding apparatus that each received frame should be displayed for three NTSC frame intervals instead of just one. If the frame rate varies, the variations will be accurately reflected in the temporal-reference values, enabling the decoding apparatus to display each frame at the correct time and for the correct duration.
A problem is that the temporal references may be corrupted by transmission errors. In the H.263 standard, for example, the temporal reference (TR) occupies a fixed eight-bit field in the coded information, and is always read as a value from zero to two hundred fifty-five, regardless of whether the value is correct or not. If the value is incorrect, the timing with which the decoded frame is displayed will be incorrect. For example, a one-bit error in the least significant bit can change xe2x80x9800001000xe2x80x99 (TR=8) to xe2x80x9800001001xe2x80x99 (TR=9), causing the associated frame to be displayed one-thirtieth of a second late. More seriously, a one-bit error in the most significant bit can change xe2x80x9800001000xe2x80x99 (TR=8) to xe2x80x9810001000xe2x80x99 (TR=136), causing the associated frame to appear more than four seconds late.
When the temporal reference wraps around from two hundred fifty-five to zero, the decoding apparatus compensates by adding two hundred fifty-six. For example, if xe2x80x9811111111xe2x80x99 (TR=255) is followed by xe2x80x9800000101xe2x80x99 (TR=5), the later value is treated as if it were two hundred sixty-one (261=5+256). This wrapping-around can greatly magnify the effect of an error. For example, if the preceding temporal reference was xe2x80x9800001100xe2x80x99 (TR=12) and a one-bit transmission error changes the current temporal reference from xe2x80x9800001111xe2x80x99 (TR=15) to xe2x80x9800001011xe2x80x99 (TR=11), then by the above rule, the current value (TR=11) is interpreted as two hundred sixty-seven (267=11+256), resulting in a delay of over eight seconds.
This type of false wrap-around propagates into succeeding frames. If the next temporal reference is xe2x80x9800010010xe2x80x99. (TR=18), it may be interpreted as two hundred seventy-four (274=18+256), even if received correctly. An error of this type can propagate forever without being discovered.
Partly to cope with temporal-reference errors, the H.263 and MPEG-4 standards divide a frame into a plurality of segments, also referred to as slices or groups of blocks (GOBs), and provide a redundant temporal-reference mode in which the temporal-reference value is included in the coding of each segment, in the segment header. An advantage of this system is that if any of the segments of a frame can be correctly decoded, these segments can be placed in their correct temporal positions. Segments that actually belong to different frames will not be placed in the same frame by mistake, for example.
A disadvantage of the redundant temporal-reference mode, however, is that an error in a temporal-reference value can cause a single frame to be interpreted as two or more frames, with a false wrap-around to zero and attendant long delay. An example will be shown in the detailed description of the invention.
The problems associated with incorrect temporal references are not limited to picture transmission systems; they also occur in decoding apparatus that reads coded moving-picture data from a storage device.
An object of the present invention is to avoid large timing errors caused by incorrect temporal-reference values.
Another object of the invention is to avoid the break-up of frames due to incorrect temporal-reference values in segments of the frames.
According to a first aspect of the invention, a picture decoding apparatus decodes a series of coded frames, each coded frame including a temporal reference. The apparatus has a temporal-reference memory unit storing a plurality of past temporal references. A temporal-reference estimation unit calculates an estimated temporal reference from the past temporal references. A temporal-reference modification unit compares the estimated temporal reference with the current temporal reference, and modifies the current temporal reference, if necessary, according to the difference between the current temporal reference and the estimated temporal reference.
According to a second aspect of the invention, each coded frame includes a plurality of coded segments, and each coded segment has its own temporal reference. The picture decoding apparatus has a temporal-reference memory unit storing the temporal references received in the coded segments in one coded frame. A temporal-reference finalizing unit compares the stored temporal references and selects a final temporal reference for the coded frame.
The first and second aspects of the invention may be combined.