Large bandwidth systems, such as video processing systems, strive to obtain high compression ratios by eliminating spatial and temporal redundancies in transmitted pictures. Spatial redundancy refers to redundant video information that is present within a single picture. One example of spatial redundancy can be found in the repeated pixel values that are present inside a single picture of a large expanse of blue sky. Temporal redundancy, on the other hand, refers to redundant video information that is present in successively occurring pictures where certain parts of the picture do not vary from one picture to the next. One such example of temporal redundancy can be found in an expanse of blue sky that is present in two successive pictures that are to be transmitted sequentially.
Spatial redundancies may be eliminated by using compression techniques, such as discrete cosine transform (DCT) and wavelet transform (WT), while temporal redundancies may be eliminated by using compression techniques that incorporate, for example, motion compensated temporal prediction. Alternative techniques, such as hybrid motion-compensated transform coding algorithms, utilize a combination of spatial and temporal compression techniques. These hybrid techniques are typically used to implement motion picture expert group (MPEG) standards, such standards being collectively referred to as “MPEG-x,” where ‘x’ is a numeric value.
When temporal compression is used, a current picture is not transmitted in its entirety; instead, the difference between the current picture and a previous picture is transmitted. At the receiver end, a decoder that already has the previous picture, can then reconstruct the current picture by adding the difference picture to the previous picture. The difference picture is created at the transmitter by subtracting every pixel in one picture from the corresponding pixel in another picture. Such a difference picture is an image of a kind, although not a viewable one, and contains some spatial redundancies, which may be eliminated by using spatial compression techniques.
The difference picture may not contain a large amount of data when stationary objects are present in sequential pictures, but when moving objects are present in successive frames the resulting difference picture will obviously, contain a significant amount of data. Generation of such large amounts of data may be minimized by using motion compensation techniques that can be used in conjunction with the generation of the difference picture. In MPEG-2 implementations, for example, motion compensation is typically accomplished using a motion estimator circuit. The motion estimator circuit measures the direction and distance of motion between two pictures and outputs the results as motion vectors. These motion vectors are used by the decoder at the receiver end to carry out motion compensation by shifting data in a previous picture to create the current picture. In effect, the motion vectors describe the optical flow axis of a certain moving screen area, along which axis the image is highly redundant. Vectors are bipolar codes which reflect the amount of horizontal and vertical shift required at the decoder.
An added level of complexity occurs during motion compensation in real-world images such as those encountered in MPEG implementations, because moving objects do not necessarily maintain their appearance as they move. For example, objects may turn, move into shade or light, or move behind other objects. Consequently, motion compensation cannot be implemented in an ideal manner, and supplementary information related to the picture has to be provided to the decoder. This supplementary information takes the form of a “predicted picture” that is also typically generated in the motion estimator circuit.
Consequently, the motion estimator circuit, in addition to producing the motion vectors, also uses the motion vectors to produce the predicted picture, which is based on the previous picture shifted by motion vectors. This predicted picture is then subtracted from the actual current picture to produce a “prediction error.” The prediction error is also often referred to as a “prediction residual.”
Several existing systems have been designed to obtain motion vectors by carrying out a motion search. This motion search employs a strategy that is geared towards producing a picture residual that has the least amount of data transmission bandwidth under the assumption that such a search strategy produces the most efficient compression. Unfortunately, while the bandwidth of the picture residual may be optimized by this approach, the bandwidth of the generated motion vectors can also turn out to be significant. It is therefore desirable to provide a solution that not only optimizes the bandwidth of the prediction residual, but of the motion vectors as well. Optimizing both the prediction residual as well as the motion vectors translates to providing optimal compression, which consequently equates to an optimal data transmission rate.
In addition to employing motion compensating techniques, video processing systems also employ encoding circuitry that operate upon signals such as the prediction residual and the motion vectors, to produce encoded data. This encoding process is dependent upon the nature of the signals, and is typically geared towards optimizing one or more signal parameters such as the signaling rate (bandwidth), picture distortion, or a combination of rate and distortion.
For example, MPEG pictures contain pixel blocks that are commonly referred to as macroblocks, which can be encoded in multiple ways. Two such modes are referred to as “intracode mode” and “bidirectional mode” operation. In a first implementation, the encoding process is selected so as to minimize the transmission rate (consequently the signaling bandwidth) of a transmitted signal, while in a second encoder implementation, the encoding process is selected to minimize picture distortion. Picture distortion may be described as a measure of either the perceived or actual difference between the original and the encoded video picture.
A third approach to implementing an encoder, uses a combination of bit rate R and distortion D, in what is referred to as a “rate-distortion” (R-D) approach, with the goal of minimizing distortion under the limitation of a pre-defined rate constraint. The rate-constrained approach can be defined by the equation:min{D(R)} subject to R≦R*, where R* is the allowed rate.
This equation can be converted to one having an unconstrained rate by using a Lagrangian multiplier λ. The unconstrained Lagrangian formula is defined by the following equation:min{J(D,R)}, where J=D+λR.
The minimization process to determine the optimal values of R and D for various values of λ can turn out to be computationally extensive, as well as expensive, if each and every encoding mode as well as motion estimation/compression process has to be evaluated using the equation above. Consequently, while several solutions currently exist to implement rate-distortion theory in macroblock mode selection as well as in motion estimation schemes, these solutions suffer from sub-optimal results and/or are computationally complex.
It is therefore desirable to provide a signal processing system that implements macroblock mode selection and/or motion estimation with reduced computational complexity.