1. Field of the Invention
This invention relates to the field of signal processing, and more particularly relates to a method and apparatus to optimize motion video encoding using both distortion and bit-rate constraints.
2. Description of the Related Art
To represent a picture image digitally, the image area is described as an array of pixels. A digital number describes the color, luminance and chrominance of each pixel. Pixel color information consists of three digital values: one digital value for red, one for green, and one for blue. Thus, a fairly large volume of data is required to describe one pixel. Accordingly, exceptionally large data files are required for complete picture images.
In full motion video, not only are large blocks of data required to describe each picture image, but a new image or frame must be presented to the viewer at approximately thirty new images per second to create the illusion of motion. Moving these large streams of video data across digital networks or phone lines is infeasible given currently available bandwidth.
Data compression is a technique for reducing the number of bits required to represent a given image. Data compression techniques utilize either a shorthand notation to signal a repetitive string of bits or omit data bits from the transmitted message. The latter form of compression is called xe2x80x9clossyxe2x80x9d compression and capitalizes upon the ability of the human mind to provide the omitted data. In motion video, much of the picture data remains constant from frame to frame. Therefore, the video data may be compressed by first describing a reference frame and describing subsequent frames in terms of the change from the reference frame.
Several international standards for the compression of digital video signals have emerged and more are currently under development. These standards apply to algorithms for the transmission and storage of compressed digital video in a variety of applications, including: video-telephony and tele-conferencing; high quality digital television transmission on coaxial and fiber-optic networks as well as broadcast terrestrially and over direct broadcast satellites; and in interactive multimedia products on CD-ROM, Digital Audio Tape, and Winchester disk drives.
Several of these standards involve algorithms based on a common core of compression techniques, e.g., the CCITT (Consultative Committee on International Telegraphy and Telephony) Recommendation H.120, the CCITT Recommendations H.261 and H.263, and the ISO/IEC MPEG-1, MPEG-2, and MPEG-4 standards. The MPEG algorithms were developed by the Moving Picture Experts Group (MPEG), as part of a joint technical committee of the International Standards Organization (ISO) and the International Electrotechnical Commission (IEC). The MPEG standards describe a compressed representation of video and associated audio signals. The standard specifies the syntax of the compressed bit stream and the method of decoding, but leaves considerable latitude for novelty and variety in the algorithm employed in the encoder.
Motion compensation is commonly utilized by video encoders in signal processing techniques that compress successive frames of digital video data for transmission via a communication medium of limited bandwidth, or for storing in a storage medium having limited storage capacity. Motion compensated video compression systems such as the ISO/ITU standards of MPEG and H.261/3 use block-based motion estimation that compares a given block of one frame to a block of another frame. Blocks are matched by determining a comparison measurement between any given pair of blocks. A comparison measurement corresponds to some form of a degree of xe2x80x9cdifferencexe2x80x9d between the two blocks. If the comparison measurement is below a predetermined threshold, the blocks may be considered to be similar enough that a block match is indicated. If so, the block in the previous video frame may be utilized and only a motion vector is required to indicate the new position of the block in the current video frame. Such motion vectors can be represented with fewer bits than the pixels that comprise the block, and fewer bits need to be transmitted (or stored) in order to recreate the block. A compression technique known as transform coding is often used to generate a bit stream to be encoded as further described hereinbelow.
Motion compensation and encoding motion compensated video are of the most computationally intensive tasks that a video encoder performs. The objective of the encoder is to produce an encoded image represented in a bit stream that provides the best visual quality for the rate of data transfer, also referred to as bit-rate, allowed by the video coding standards.
In one embodiment, a method for optimizing the video encoding process for a macroblock in a block-based video encoder is provided, wherein a plurality of candidate motion vectors, mode vectors, and quantized discrete cosine transform coefficients based on the macroblock and the candidate motion vectors are provided. The method includes
(a) estimating the length of a bitstream that would be required to encode the quantized discrete cosine transform coefficients, the motion vectors, and the mode vectors;
(b) generating a bit-rate term based on the length of the bit stream;
(c) determining a measure of distortion based on quantized discrete cosine transform coefficients;
(d) determining a rate-constrained distortion signal based on the block distortion and the bit-rate term;
(e) repeating (a) through (d) for each candidate motion vector and mode vector; and
(f) selecting the motion vector corresponding the minimum motion estimation signal.
A Lagrange multiplier may be used to determine the bit-rate term in (b). Further, selected processes in the method may be executed in parallel to decrease time delay.
In another embodiment, an apparatus for optimal video encoding that selects a motion vector and corresponding mode vector, and a quantization scale factor, for a current macro-block in a block based video encoder is provided. A plurality of candidate motion vectors and corresponding mode vectors, and quantized discrete cosine transform coefficients based on the macroblock and the candidate motion vectors and mode vectors are provided. The apparatus includes
a video encoder preprocessor connected to receive the quantized discrete cosine transform coefficients, the candidate motion vectors, and the candidate mode vectors, the video encoder preprocessor being operable to estimate the length of a bit stream that would be required to encode each candidate motion vector, corresponding mode vector, and corresponding quantized discrete cosine transform coefficients, and to transmit the length of the bit stream;
a Lagrange multiplier unit connected to receive the length of the bit stream, the Lagrange multiplier unit being operable to generate a bit-rate term based on the length of the bit stream, and to transmit the bit-rate term;
an inverse quantization unit connected to receive the discrete cosine transform coefficients, the inverse quantization unit being operable to determine inverse quantized discrete cosine transform coefficients and to transmit the inverse quantized discrete cosine transform coefficients; and
a distortion calculator unit connected to receive the inverse quantized discrete cosine transform coefficients, the distortion calculator unit being operable to generate a distortion signal.
The apparatus is further operable to determine a measure of distortion based on the distortion signal and to determine a motion estimation signal based on the measure of distortion and the bit-rate term for each candidate motion vector and corresponding mode vector, to select the motion vector having the minimum motion estimation signal, and to select a quantization scale factor based on the measure of distortion and the bit-rate term. The selected motion vector and corresponding mode vector is output to a buffer for transmission to a video encoder.
Advantageously, the present invention generates an estimate of the bit-rate term without requiring each candidate motion vector and corresponding mode vector to be encoded.