Video compression enables storing, transmitting, and processing audio-visual information with fewer storage, network, and processor resources. The most widely used video compression standards include MPEG-1 for storage and retrieval of moving pictures, MPEG-2 for digital television, and MPEG-4 and H.263 for low-bit rate video communications, see ISO/IEC 11172-2:1991. “Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbps,” ISO/IEC 13818-2:1994, “Information technology—generic coding of moving pictures and associated audio,” ISO/IEC 14496-2:1999, “Information technology—coding of audio/visual objects,” and ITU-T, “Video Coding for Low Bitrate Communication,” Recommendation H.263, March 1996.
These standards are relatively low-level specifications that primarily deal with a spatial compression of images or frames, and the spatial and temporal compression of sequences of frames. As a common feature, these standards perform compression on a per image basis. With these standards, one can achieve high compression ratios for a wide range of applications.
Interlaced video is commonly used in scan format television systems. In an interlaced video, each image of the video is divided into a top-field and a bottom-field. The two interlaced fields represent odd- and even-numbered rows or lines of picture elements (pixels) in the image. The two fields are sampled at different times to improve a temporal smoothness of the video during playback. Compared to a progressive video scan format, an interlaced video has different characteristics and provides more encoding options.
As shown in FIG. 1, one 16×16 frame-based macroblock 110 can be partitioned into two 16×8 field-based blocks 111–112. In this way, a discrete cosine transform (DCT) can be applied to either frames or fields of the video. Also, there is a significant flexibility in the way that blocks in the current frame or field are predicted from previous frames or fields. Because these different encoding options provide different compression efficiencies, an adaptive method for selecting a frame encoding mode or a field encoding mode is desirable.
Frame and field encoding tools included in the MPEG-2 standard are described by Puri et al., “Adaptive Frame/Field Motion Compensated Video Coding,” Signal Processing: Image Communications, 1993, and Netravali et al., “Digital Pictures: Representation Compression and Standards,” Second Edition, Plenum Press, New York, 1995. Adaptive methods for selecting picture level encoding modes are not described in those two references.
U.S. Pat. No. 5,168,357, “Method for a calculation of a decision result for a field/frame data compression method,” issued on Dec. 1, 1992 to Kutka, describes a method for deciding a transform type for each 16×16 macroblock of an HDTV video, specifically, the selection between a 16×16 frame block DCT or a 16×8 field block DCT. In that method, differences between pairs of field pixels of two lines of the same field are absolutely summed up to form a field sum. Likewise, differences between pairs of frame pixels of two lines of the frame are absolutely summed up to form a frame sum. The frame sum multiplied by a frame weighting factor is subtracted from the field sum to form a decision result. If the decision result is positive, then the frame is encoded; otherwise, the two fields are encoded separately.
U.S. Pat. No. 5,227,878, “Adaptive coding and decoding of frames and fields of video,” issued on Jul. 13, 1993 to Puri et al., describes a video encoding and decoding method. In that method, for frame encoding, four 8×8 luminance subblocks are formed from a macroblock; for field encoding, four 8×8 luminance subblocks are derived from a macroblock by separating the lines of the two fields, such that each subblock contains only lines of one field. If the difference between adjacent scan lines is greater than the differences between alternate odd and even scan lines, then field encoding is selected. Otherwise, frame encoding is selected. An 8×8 DCT is then applied to each frame subblock or field subblock, depending on the mode selected.
U.S. Pat. No. 5,434,622, “Image signal encoding apparatus using adaptive frame/field format compression,” issued on Jul. 18, 1995 to Lim, describes a procedure for selecting between frame and field format compression on a block-by-block basis. In that procedure, the selection is based on the number of bits used for each block corresponding to the specified encoding format. The distortion of the corresponding block is not considered. A compression scheme is not provided.
U.S. Pat. No. 5,737,020, “Adaptive field/frame encoding of discrete cosine transform,” issued on Apr. 7, 1998 to Hall and et al, describes a method of DCT compression of a digital video image. In that method, the field variance and frame variance are calculated. When the field variance is less than the frame variance, field DCT type compression is performed. Alternatively, when the frame variance is less than the field variance, then a frame DCT compression is performed.
U.S. Pat. No. 5,878,166, “Field frame macroblock encoding decision,” issued on Mar. 2, 1999 to Legall, describes a method for making a field frame macroblock encoding decision. The frame based activity of the macroblock is obtained by summing absolute differences of horizontal pixel pairs and absolute differences of vertical pixel pairs. The result is summed over all the blocks in the macroblock. The first and second field-based activity are obtained similarly. The mode with less activity is selected.
U.S. Pat. No. 6,226,327, “Video coding method and apparatus which select between frame-based and field-based predictive modes,” issued on May 1, 2001 to Igarashi et al. describes an image as a mosaic of areas. Each area is encoded using either frame-based motion compensation of a previously encoded area, or field-based motion compensation of a previously encoded area, depending on the result that yields the least amount of motion compensation data. Each area is orthogonally transformed using either a frame-based transformation or a field-based transformation, depending on the result that yields the least amount of motion compensation data.
The above cited patents all describe methods in which an adaptive field/frame mode decision is used to improve the compression of the interlaced video signal using macroblock based encoding methods. However, only local image information or the number of the bits needed for the encoding is used to select the DCT type and motion prediction mode of the local macroblock. None of the those methods consider the global content when making encoding decisions.
FIG. 2 shows a well known architecture 200 for encoding a video according to the MPEG-2 encoding standard. A frame of an input video is compared with a previously decoded frame stored in a frame buffer. Motion compensation (MC) and motion estimation (ME) are applied to the previous frame. The prediction error or difference signal is DCT transformed and quantized (Q), and then variable length coded (VLC) to produce an output bitstream.
As shown in FIG. 3 for the MPEG-2 standard mode encoding 300, motion estimation for each frame is encoded by either frame-coding or field-coding modes. With a given frame level mode, there are various associated macroblock modes. FIG. 3 shows the relationship between picture encoding modes, and macroblock encoding modes at the picture level, and the block level.
MPEG-2 video encoders can use either frame-only encoding, where all the frames of a video are encoded as frames, or field-only encoding, where each frame is encoded as two fields, and the two fields of a frame are encoded sequentially. In addition to the picture level selection, a selection procedure at the macroblock level is used to select the best macroblock-coding mode, i.e., intra, DMV, field, frame, 16×8, or skip mode. One important point to make is that the macroblock modes are not optimized unless the frame level decision is optimized.
FIGS. 4A and 4B show how a macroblock for a current (cur) frame can be predicted using a field prediction mode in frame pictures, or a field prediction mode in field pictures, respectively, for I-, P-, and B-fields. The adaptive mode decision based on the options in FIG. 4A is referred to as adaptive field/frame encoding. However, there the encoding is only at the macroblock-level, which is less than optimal due to mode restrictions.
For instance, in that macroblock-based selection, the second I-field can only be encoded with intra mode, and the P-field and B-field can only be predicted from the previous frame. On the other hand, if the frame level mode is field-only, then the second I-field can be encoded with inter mode and predicted from the first I-field; the second P-field can predicted from the first P field, even if field is located in the same frame.
FIG. 5 shows a two pass macroblock frame/field encoding method 500 that solves the problems associated with the encoding according to FIG. 4. That method has been adopted by the Joint Video Team (JVT) reference code, see ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, “Adaptive Frame/Field Coding for JVT” in JVT-B071. In that method, the input is first encoded by frame mode. The distortion and bit rate (R/D) are extracted and saved. The frame is then encoded by field mode. The corresponding distortion and bit rate are also recorded. After that, a function (F) compares the costs of the two encoding modes. The mode with smaller cost is then selected to encode the video as output.
The method 500 has several problems. The method requires two-passes and uses a fixed predetermined quantization (Q). Consequently, the JVT standard method requires a significant amount of computation for each frame and is less suitable for encoding a video in real-time.
U.S. Pat. No. 6,466,621, “Video coding method and corresponding video coder,” issued on Oct. 15, 2002 to Cougnard, et al. describes a different type of two-pass encoding method 600. The block diagram of that method is shown in FIG. 6. In the first pass, each frame of the input is encoded in parallel paths using the field encoding mode and the frame encoding mode. During the first pass, statistics are extracted in each path, i.e., the number of bits used by each co-positional macroblock in each mode, and the number of field motion compensated macroblocks. The statistics are compared, and a decision to encode the output in either field or frame mode is made. In the second pass, the frame is re-encoded according to the decision and extracted statistics.
The prior art field/frame encoding methods do not address rate control or motion activity. Therefore, there is a need for an adaptive field/frame encoding method with effective rate control considering motion activity.