The present invention relates to a picture coding method and apparatus for coding still or moving pictures that are divided into segmented frames.
The term xe2x80x98framexe2x80x99 refers to one complete still picture, or one complete still picture in a sequence of pictures constituting a moving picture, or to the corresponding part of a video object plane.
Recent years have seen the emergence of various international standards for coding pictures for transmission by videophones, videoconferencing systems, video-on-demand (VOD) systems, and the like. For still pictures, the JPEG method, developed by the Joint Photographic Experts Group and adopted by the International Organization for Standardization (ISO), is well known. For moving pictures, the Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T) has developed standards described in recommendations H.261, H.262, and H.263, and the Moving Picture Experts Group has developed ISO standards referred to as MPEG-1, MPEG-2, and MPEG-4.
The methods adopted in these standards code a picture by dividing the picture into small regions referred to as macroblocks. The term xe2x80x98macroblockxe2x80x99 is used because a macroblock is divided into several smaller blocks. Each block or macroblock is coded separately by a process that typically includes a mathematical transform, quantization of the resulting coefficients, and coding of the quantized data. For moving pictures, the coding may be carried out in an inter-frame mode, in which the macroblock is coded with reference to corresponding data from a preceding frame, or an intra-frame mode, in which the macroblock is coded without such reference. Each of these coding modes may have various sub-modes.
The bitstream generated by the coding process includes both the coded macroblock data and general information pertaining to the picture as a whole. Examples of this general information include frame timing information and coding-mode information. The general information is more critical than the coded macroblock data, because if the general information is lost or corrupted by an error, it may become impossible to decode an entire frame, whereas the loss or corruption of coded macroblock data usually affects only part of a frame.
The general information about a frame is placed in a header at the beginning of the coded frame data. The header is referred to in various standards as a frame header, picture header, or access unit header. The term xe2x80x98picture headerxe2x80x99 will be used below to denote all these types of headers. FIG. 1 shows an example of coded frame data (FD) starting with a picture header (PH).
Although the coder may generate a substantially continuous bitstream, when the bitstream is transmitted through a communication network, it is usually divided into a series of separate units. If the Transmission Control Procedure/Internet Protocol (TCP/IP) is employed, for example, the bitstream is divided into units referred to as IP packets. As another example, ITU-T recommendation H.223 describes a time-division multiplexing scheme for media data (picture data, audio data, and other data) in which the various media data are divided into separate packets for transmission.
In packet communication networks, packets must sometimes be discarded or xe2x80x98droppedxe2x80x99 due to congestion at a network node. To limit the effect of a dropped packet to a single frame, each frame can be transmitted as a separate packet, as shown in FIG. 2. Since larger packets are more likely to be dropped than smaller packets, however, each frame can be more advantageously divided into a plurality of packets, as shown in FIG. 3. In this case, when a packet is dropped, synchronization is temporarily lost, making it impossible to decode the dropped packet and any following packets in the same frame, but synchronization is regained when the next picture header is recognized, making the next frame decodable.
In both FIGS. 2 and 3, the loss of a single packet tends to have an undesirably large effect on picture quality. To reduce this effect, the above-mentioned standards employ synchronization units smaller than a frame. These smaller synchronization units are known in the various standards as segments, groups of blocks (GOBs), slices, and video packets. The term xe2x80x98segmentxe2x80x99 will be used below to refer to any of these synchronization units. Each segment in a frame comprises a plurality of macroblocks, and includes a segment header giving information needed for decoding the constituent macroblocks. Each segment is transmitted as a separate packet.
FIG. 4 shows an example of a frame divided into segments at predetermined locations, the first segment beginning with a picture header (PH), the following segments beginning with respective segment headers (SH). This type of fixed segment structure is used in the H.261 and H.263 standards. FIG. 5 shows an example in which the segment divisions can be made in arbitrary positions in the frame, as allowed in the H.263 and MPEG-4 standards.
An issue in these segmentation schemes is how much information to put into the segment headers. If all of the information in the picture header is repeated in each segment header, then the loss of a packet never affects more than one segment in the frame, but the repeated header information uses up so many bits that picture quality may be noticeably degraded in all segments, because fewer bits are available for coding the macroblock data.
The most efficient scheme is to place information pertaining to the frame as a whole in the picture header, and place information pertaining only to a particular segment in the segment header. The problem with this scheme is that the loss of the packet including the picture header makes the entire frame undecodable.
A compromise scheme places information applying to the frame as a whole in the picture header, and repeats this information in the segment headers if it differs from the corresponding information in the preceding frame. Each segment header also includes information applying only to its own segment. Then even if the segment including the picture header is lost, the other segments can be decoded by use of the picture header from the preceding frame. This scheme is still fairly inefficient, because a comparatively large amount of picture-header information must often be repeated in the segment headers.
A more efficient compromise scheme sets a flag in a segment header when the picture header contains information pertaining to the segment and differing from the information in the preceding frame. If the picture header is lost, segments in which this flag is not set are decoded using the picture header of the preceding frame, while segments in which this flag is set are not decoded.
FIG. 6 shows an example in which this flag scheme is employed. The second segment in the first frame is lost, but the other segments in the first frame can be decoded, as can the entire second frame. In the third frame, however, the first segment is lost. The picture header in this segment includes information pertaining to all the segments in the frame, and this information differs from the information in the preceding frame, so none of the segments can be decoded. As this example shows, the use of flags fails to prevent the loss of an entire frame in the not-so-rare case in which a picture header including information differing from the preceding frame is lost.
When each segment is transmitted as a separate packet, the probability of packet loss can be reduced by reducing the segment size: for example, by reducing the number of macroblocks per segment, or the amount of coded data per segment. As a result, however, each frame is divided into more segments, and coding efficiency is adversely affected by the need for more segment headers.
The basic problem with all of the conventional schemes outlined above is that to reduce the probability of picture-header loss, they require much additional header information, with a corresponding penalty in coding efficiency.
The same problem occurs if these schemes are used for protection against read/write errors when coded picture data are stored on recording media.
An object of the present invention is to provide improved protection against the loss of critical information in coded picture data.
Another object of the invention is to provide such protection without increasing the necessary amount of coded data.
Another object is to minimize the amount of coded data needed to obtain adequate protection from data loss.
According to a first aspect of the invention, each frame of a coded picture is divided into critical and non-critical segments. A critical segment includes information, such as a picture header, needed for decoding other segments in the same frame. A non-critical segment includes only information needed for its own decoding. The coding process is controlled so that, on the average, critical segments are smaller than non-critical segments.
The segment size may be controlled directly or indirectly. Indirect methods of controlling the segment size include using a larger quantization step size in critical segments than in non-critical segments, and using a lower forced intra-coding rate in critical segments than in non-critical segments.
According to a second aspect of the invention, each frame of coded picture data is divided into segments, each segment having a header. The segment size is controlled dynamically, according to a communication condition related to the occurrence of transmission errors on a communication path over which the coded segments are transmitted. The segment size is increased under conditions associated with a low transmission error rate, and is decreased under conditions associated with a high transmission error rate.
Both aspects of the invention provide enhanced protection for critical information, because smaller segments are less likely than large segments to experience transmission errors.
The first aspect of the invention provides enhanced protection for critical information without increasing the amount of coded data. The amount of coded data may actually be reduced.
The second aspect of the invention minimizes the necessary amount of coded data by minimizing the number of segments, hence the number of headers.