This invention relates in general to compression of digital visual images, and more particularly, to a technique for encoding macroblocks of a frame of a sequence of video frames using luminance data to analyze temporal redundancy of macroblocks within the frame, and when a decision to non-intra code a macroblock is made, subsequently re-evaluating the decision using chrominance data to determine whether to switch the macroblock to intra coding.
Technological advances in digital transmission networks, digital storage media, very large scale integration devices, and digital processing of video and audio signals have been converging to make the transmission and storage of digital video economical in a wide variety of applications. Because the storage and transmission of digital video signals is central to many applications, and because an uncompressed representation of a video signal requires a large amount of storage, the use of digital video compression techniques is vital to this advancing art. In this regard, several international standards for the compression of digital video signals have emerged over the past decade, with more currently under development. These standards apply to algorithms for the transmission and storage of compressed digital video in a variety of applications, including: video-telephony and teleconferencing; high quality digital television transmission on coaxial and fiberoptic networks, as well as broadcast terrestrially and other direct broadcast satellites; and in interactive multimedia products on CD-ROM, Digital Audio Tape, and Winchester disk drives.
Several of these standards involve algorithms based on a common core of compression techniques, e.g., the CCITT (Consultative Committee on International Telegraphy and Telephony) Recommendation H.120, the CCITT Recommendation H.261, and the ISO/IEC MPEG-1 and MPEG-2 standards. The MPEG algorithms have been developed by the Moving Picture Experts Group (MPEG), part of a joint technical committee of the International Standards Organization (ISO) and the International Electrotechnical Commission (IEC). The MPEG committee has been developing standards for the multiplexed, compressed representation of video and associated audio signals.
The MPEG-2 standard describes an encoding method that results in substantial bandwidth reduction by a subjective lossy compression followed by a lossless compression. The encoded, compressed digital data is subsequently decompressed and decoded in an MPEG-2 compliant decoder. The MPEG-2 standard specifies a very high compression technique that achieves compression not achievable with intraframe coding alone, while preserving the random access advantages of pure intraframe coding. The combination of frequency domain intraframe encoding and interpolative/predictive interframe encoding of the MPEG-2 standard results in a balance between intraframe encoding and interframe encoding.
The MPEG-2 standard exploits temporal redundancy for motion compensated interpolative and predictive encoding. That is, an assumption is made that xe2x80x9clocallyxe2x80x9d the current picture can be modeled as a translation of the picture at a previous and/or future time. xe2x80x9cLocallyxe2x80x9d implies that the amplitude and direction of the displacement are not the same everywhere in the picture.
The MPEG-2 standard further specifies predictive and interpolative interframe encoding and frequency domain intraframe encoding. It has block-based motion compensation for the reduction of temporal redundancy and discrete cosine transform based compression for the reduction of spatial redundancy. Under MPEG-2, motion compensation is achieved by predictive coding, interpolative coding, and variable length coded motion vectors. The information relative to motion is based on a 16xc3x9716 array of pixels and is transmitted with the spatial information. It is compressed with variable length codes, such as Huffman codes.
The ISO MPEG-2 compression standard specifies only the syntax of bitstream and semantics of the decoding process. The choice of coding parameters and trade-offs in performance versus complexity are left to the encoder developers.
One aspect of the encoding process is compressing a digital video image into as small a bitstream as possible while still maintaining video detail and quality. The MPEG standard places limitations on the size of the bitstream, and requires that the encoder be able to perform the encoding process. Thus, simply optimizing the bit rate to maintain desired picture quality and detail can be difficult.
As noted, in the field of video coding, an aspect of many compression algorithms (such as the MPEG-2 standard), is a reliance upon temporal redundancy. Temporal redundancy refers to the similarity between two pictures to be coded. When two pictures or frames have similar content, significant savings in the amount of data required to code the frames is realized by coding the differences between the pictures, rather than their entire content. When pictures are digitized, their contents are described by numeric values which represent color and brightness. Each picture element or pixel is qualified as a number or a set of numbers. For most applications, these numbers represent the RGB values of the pixel, or more commonly, the luminance (Y) and chrominance (Cr,Cb).
Large amounts of data are required to represent a picture. For example, an NTSC picture of 720xc3x97480 pixels in a 4:2:0 chroma format with one byte each for Y, Cr and Cb, requires 518,400 bytes. In view of this, a search to analyze the temporal redundancy of a picture is typically done on one type of data, which is conventionally the luminance or Y data. Applicants have discovered that for certain pictures, this can lead to erroneous conclusions with respect to the similarities of the two pictures. The Y data may be quite close in value between pictures, while the Cr and/or Cb data may be quite different. In such a case, if a difference coding (i.e., inter-coded macroblock) decision is made rather than a complete code (i.e., intra-coded macroblock), poor visual results may occur when the encoded pictures are subsequently decoded and displayed.
To avoid this problem, disclosed herein is a technique for selectively checking whether a Cr Cb difference against a threshold is exceeded (i.e., checking in a macroblock that has been chosen as an inter-coded macroblock by the Y search), and when exceeded, to reverse the macroblock coding decision, thereby changing the decision to code the macroblock to intra macroblock coding.
Briefly summarized, the present invention comprises in one aspect a method for encoding macroblocks of at least one frame of a sequence of video frames. The method includes encoding at least one macroblock of the frame by deciding, using luminance data of the at least one macroblock, to code the at least one macroblock as a non-intra macroblock; and subsequently re-evaluating the coding decision for the at least one macroblock and switching the coding decision for the at least one macroblock from non-intra to intra if consideration of chrominance data of the at least one macroblock requires a change.
Systems and computer program products corresponding to the above-summarized methods are also described and claimed herein.
To restate, applicants recognize herein that coding a macroblock as non-intra (inter) when the actual chrominance versus reference chrominance data widely differs, leads to larger loss in the compression (i.e., quantization) stage of encoding, resulting in visual artifacts such as blockiness, coefficient clipping errors, etc. Thus, picture quality is improved herein by monitoring the chrominance difference data and subsequently switching a decision to code a macroblock to intra-code the macroblock should the chrominance difference data exceed a defined threshold. This actually reduces the bits required to code the macroblock and picture, since coding a macroblock as non-intra when the actual versus reference chrominance data widely differs results in an increased number of bits used to code the macroblock. More non-zero coefficients in the quantized block (i.e., a predictable result of large differences between actual versus reference pixel values), causes the variable length encode (VLE) to use larger (i.e., more bits) run-length codes and fixed length escape codes as defined by MPEG-2 DCT coefficients tables.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention.