The invention relates generally to the compression of video signals. More particularly, the invention relates to a method and apparatus for generating smooth residuals in block motion compensated, transform-based video coders (BMT coders).
The demand for services such as video on demand and video conferencing is on the rise. To meet this demand, service providers are engineering ways of providing video signals over a variety of communications networks, such as the public switched telephone network (PSTN), the Integrated Service Digital Network (ISDN), the Internet, and cellular systems. The transmission of video signals, however, requires a larger amount of bandwidth than is available on these types of communications systems.
To solve this problem, these systems employ a video coder/decoder (codec or coder). A video coder compresses a digital video signal representing a video sequence, typically a frame or picture, by reducing the number of coded bits required to represent the video sequence while maintaining an acceptable viewing quality. This results in a lower transmission bit rate, but somewhat reduced visual quality.
Different communications systems require different degrees of compression. For example, a bit rate of 64 kilo-bits per second (Kbps) or lower is desirable for ISDN systems. The standard PSTN bandwidth requires an even lower bit rate of approximately 28.8 Kbps. Bit rates at these speeds, however, require a video coder to compress the information contained in a digital video sequence by a factor of 300 to 1, or more. To achieve such a large compression ratio requires the coder to remove a substantial amount of the redundancy inherent in the video sequence at the expense of quality.
One method of removing this redundancy is through the use of BMT coders. Current standards, such as International Telecommunications Union (ITU) H.261 (ITU-T1), International Standards Organization/International ElectroTechnical Committee (ISO/IEC) Motion Picture Expert Group One (MPEG-1) (ISO/IEC 11172-2), and MPEG-2 (ISO/IEC 13818-2) provide compression of a digital video sequence by utilizing a block motion-compensated Discrete Cosine Transform (DCT) approach. BMT coders remove the redundancy present in a video sequence using a combination of two compression techniques.
The first compression technique is referred to as motion compensated prediction coding (MCPC). MCPC takes advantage of the correlation of video frames in the time domain. The basic idea is to find the parts of each current frame that have moved or changed from a reference frame and code only the changes, which are called residuals. The reference frame can be a frame that is earlier or later in time than the current frame. Each current frame is then built by adding the decoded residuals to the prediction based on the reference frame.
MCPC employs a technique referred to as block matching. A portion of a current frame called a base block is selected. Typically, this block is an 8xc3x978 or 16xc3x9716 matrix of pixels (or pels). A pixel is a single point in a picture or frame. The reference frame is then searched for a block which matches the base block to some degree of similarity. When a match is found, the location of the block in the reference frame is coded using motion vectors. This continues until all base blocks representing changes in the current frame are found in the reference frame. A trial predicted frame is then built by moving blocks from the reference frame using the motion vectors. The predicted frame is subtracted from the actual current frame to make a residual image, transformed using DCT coding (described below), and coded for transmission. At the receiving end, the process is reversed. The predicted frame is built from the reference frame, and the residual image is decoded and added to the predicted frame.
The second compression technique is referred to as DCT coding. DCT coding takes advantage of the intra-picture, two-dimensional correlation of a video signal. DCT coding orthogonally transforms a base block of the current frame, or a block of motion prediction errors, to the frequency domain. The signal power for the resultant block is concentrated in specific frequency components. Consequently, quantizing bits need only be allocated to the DCT coefficients in the region in which the signal power is concentrated. This further reduces the digital video signal required to represent the current frame. For example, in a region in which the image has little detail, and in which the video signal is thus highly correlated, the DCT coefficients are concentrated at low frequencies. In that case, only the DCT coefficients in the low-frequency region of the distribution pattern are quantized to reduce the quantity of the digital video signal.
In sum, a BMT coder compresses a video signal by matching a base block from a current video frame with blocks from a reference frame. The matched block is referred to as the prediction block. The coder generates a differential block using a base block and prediction block. The differential block represents the motion predicted error between the current and prediction blocks. The differential block is then transformed using a space-to-frequency domain transformation such as the Discrete Cosine Transform, quantized and finally, entropy coded. The coded residual along with location information (i.e., motion vectors) for the prediction block and quantization information forms the basis for decoding that particular block at the receiving end.
One of the keys to achieving good video coding efficiency lies in the BMT coder""s ability to find the xe2x80x9cbestxe2x80x9d prediction block. From an entropy encoding point of view, the best prediction block is the prediction block which will produce a differential block which can be represented by a minimum number of coded bits. While finding the best block is virtually impossible without spending considerable computational resources, many BMT coders attempt to find the best prediction block based on values derived from a block distortion measure. A block distortion measure quantifies the global dissimilarity between the current and prediction block. Examples of conventional block distortion measures include Sum of Absolute Differences (SAD), Sum of Weighted Differences (SWD) and the Mean Squared Error (MSE).
Conventional block distortion measures, however, are not designed to select a prediction block which fully enhances coding efficiency for BMT coders. The best prediction block for BMT coders is the one that produces the least number of bits for a given quantization level. Because entropy coding tables are generally designed such that higher frequency coefficients produce more bits than lower frequency coefficients, the best prediction block is often the one with the least number of high frequency coefficients. In other words, for BMT coders, the search for the best prediction block often implies searching for the block that produces the lowest amount of high frequency energy, or rather produces the smoothest residual. BMT coders using conventional block distortion measures, however, fail to factor in the amount of high frequency energy present in the associated differential block. Hence, the increase in coded bits decreases coding efficiency since more coded bits are used than necessary.
In view of the foregoing, it can be appreciated that a substantial need exists for a block distortion measure for use with a BMT coder for selecting a prediction block which produces a differential block having a minimal amount of high frequency energy, thereby increasing coding efficiency for a current frame.
This and other needs are met by a method and apparatus for identifying a prediction block which produces smooth residuals in BMT coders. A base block from a first image, and a candidate prediction block from a second image, are selected. A differential block is generated using the candidate prediction block and the base block. The differential block is passed through a filter. A total energy value for the filtered lock is measured, and forms the basis for selecting a prediction block.